NXdata needs additional information for data to be plotted accurately #1527

ggoneiESS · 2025-01-08T19:03:54Z

NXdata states should have a shape that matches data dimension(s), such that a given value, obtained at data[i][j][k] can be plotted at some point given by other symbols. Additionally,

The NXdata class is designed to encapsulate all the information required for a set of data to be plotted.

However, it is not possible to use NXdata to plot data which has been integrated without making explicit assumptions. Suppose a set of data which has been integrated already (e.g. by a beam monitor) into bins of non-equal width, integrated in 1 ms periods (and equal each recorded period):

Low Independent | High Independent | Counts
___________________________________________
              0 |                1 |    10
              1 |                3 |    20
              3 |                6 |    30
              6 |               10 |    40
//        10-15 deliberately missed out
             15 |               20 |    50

One could choose to use either the low edge, high edge, or their mid-point, to use as a point to plot the data; using the FIELDNAME_errors field could help to describe the range of the measurement too, but that is a contradiction to what it should represent in the specification. Therefore, either the first bin or last bin edge will be missed, and any information about the data being integrated will be lost. Thus, NXdata will fail to be able to represent the data accurately, since the recorded data currently would have to look like:

data[[10, 20, 30, 40, 50],
     [10, 20, 30, 40, 50],
     [10, 20, 30, 40, 50],
     [10, 20, 30, 40, 50],
     [10, 20, 30, 40, 50]]
x[0, 1, 3, 6, 15] // or x[1, 3, 6, 10, 20], or x[0.5, 2, 4.5, 8, 17.5]
t[0, 1000, 2000, 3000, 4000] // or t[1000, 2000, 3000, 4000, 5000], or t[500, 1500, 2500, 3500, 4500]

This could be fixed with four additional requirements:

Requiring all numeric data to be continuous
Specifying whether the value is a point or integrated datum
2b. And if integrated whether the value given corresponds to the leading or trailing bin edge*
Whether there is an overflow bin at the none, first, last, or both bin

The data that is to be represented is equivalent to:

Low Independent | High Independent | Counts
___________________________________________
              0 |                0 |    0   // infinitesimal width bins have zero counts
              0 |                1 |    10
              1 |                3 |    20
              3 |                6 |    30   // bin A
              6 |                6 |    0   // infinitesimal width bins have zero counts BUT this bin implies a discontinuity between bin A and B
              6 |               10 |    40   // bin B
             10 |               15 |    0   // a 'missing' bin is functionally equivalent to a bin with no counts
             15 |               20 |    50
             20 |               20 |    0   // infinitesimal width bins have zero counts

which would be represented with

data[[0,  0,  0,  0,  0,  0,  0],
     [0, 10, 20, 30, 40, 0, 50],    // underflow example: [7317, 10, 20, 30, 40, 0, 50]
     [0, 10, 20, 30, 40, 0, 50],
     [0, 10, 20, 30, 40, 0, 50],
     [0, 10, 20, 30, 40, 0, 50],
     [0, 10, 20, 30, 40, 0, 50]]
x[0, 1, 3, 6, 10, 15, 20]
t[0, 1000, 2000, 3000, 4000, 5000]

There may be use cases where it is better to assign the value to the leading edge of the bin (e.g. if there is an overflow bin at the end without underflow), and so the data could also be written as

data[[10, 20, 30, 40, 0, 50, 0],
     [10, 20, 30, 40, 0, 50, 0],
     [10, 20, 30, 40, 0, 50, 0],
     [10, 20, 30, 40, 0, 50, 0],
     [10, 20, 30, 40, 0, 50, 0],
     [10, 20, 30, 40, 0, 50, 0],
     [0,  0,  0,  0,  0,  0,  0]]

without changing the axes.

I understand that

NXdata provides data and coordinates to be plotted but does not describe how the data is to be plotted

but in this example it is vital to be able to specify that this is not point-like data - the integral of each bin is actually equal. Currently, although I can see a possibility to manipulate the data in such a way that bin edges are defined, there is no way to specify what is going on when that file is read by others, and it could easily be interpreted, inaccurately, as a set of points. For similar reasons, it should be explicit that bins must be contiguous.

*an alternative is to use the centre of the bin, which is most likely what is required for statistical analysis, but this complicates representation - in the example here, it becomes a bit convoluted to work out how big that bin actually is, and would require the length of each axis to be 1 element longer than data. Instead, one workaround might be to use centre always for statistical analysis unless explicitly stated by using the keyword again: trailing trailing for analysis which should be filled at AND use the trailing edge of the bin, trailing leading for analysis which should be filled at the trailing edge of the bin BUT use the leading edge of the bin.

The text was updated successfully, but these errors were encountered:

ggoneiESS · 2025-01-08T19:21:03Z

The edit was to the final representation of the data array

PeterC-DLS · 2025-01-09T13:52:45Z

For the all-present bins case, text has been proposed in #1396 to include histogram axes that contain the bin edges - see L328. Not sure what can be a general solution for the missing/omitted bins case. Options include:

if counts are floats then use NaNs
if counts are integers then use 0
use a specific negative/zero value in all cases

ggoneiESS · 2025-01-09T15:18:34Z

Thanks Peter - I'm going to close this issue and hopefully the rest of the discussion can be had on that PR. I did look at the first couple of pages of PRs, open and closed, but the linked one was tucked away on page 3!

ggoneiESS · 2025-01-20T10:16:48Z

Discussion in #1396 suggests we re-open this issue.

I would propose that I submit a PR, but only after the axes definition is pushed. Any thoughts @rayosborn ?

ggoneiESS closed this as completed Jan 9, 2025

ggoneiESS reopened this Jan 9, 2025

ggoneiESS closed this as not planned Won't fix, can't repro, duplicate, stale Jan 9, 2025

rayosborn mentioned this issue Jan 9, 2025

Fix: 1381 change to the axes attribute meaning #1396

Open

ggoneiESS reopened this Jan 20, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

NXdata needs additional information for data to be plotted accurately #1527

NXdata needs additional information for data to be plotted accurately #1527

ggoneiESS commented Jan 8, 2025 •

edited

Loading

ggoneiESS commented Jan 8, 2025

PeterC-DLS commented Jan 9, 2025

ggoneiESS commented Jan 9, 2025

ggoneiESS commented Jan 20, 2025

NXdata needs additional information for data to be plotted accurately #1527

NXdata needs additional information for data to be plotted accurately #1527

Comments

ggoneiESS commented Jan 8, 2025 • edited Loading

ggoneiESS commented Jan 8, 2025

PeterC-DLS commented Jan 9, 2025

ggoneiESS commented Jan 9, 2025

ggoneiESS commented Jan 20, 2025

ggoneiESS commented Jan 8, 2025 •

edited

Loading