Clarify Frame API #365

GDYendell · 2024-10-17T16:34:46Z

From #360 and #361 it emerged that a lot of the naming and API of frames, chunks, images, etc. is very unclear. This is a list of things I think could be updated to provide better definitions to make the code clearer. All proposed names are up for discussion.

Remove usage of image and chunk from Frame - Frame should be as generic to the data it stores as possible, it primarily defines the structure storing some data that is passed between plugins
Fundamentally, Frame should have a pointer to some data, and its size. To understand the data, we also need to know e.g. the datatype, shape, the compression
Update Frame to have a concept of "data element shape" and a "count" of those elements to allow batching
It should be possible for downstream plugins to treat one Frame with 10 elements of shape 100,100 the same as 10 Frames with one element of shape 100,100. If upstream plugins need to batch data for efficiency, it should be possible for downstream plugins to get a view onto the data of one element at a time. Currently there is not enough information to do this and the entire data from the Frame must be handled at once.
Frame.data_element_shape should map onto the inner N dimensions of the hdf file
Conceptually, Frame.count would generally match the outer chunk dimension in the hdf file for performance reasons, but it does not have to, for example if the reader application wants to slice the data differently that it comes out of the detector. It should be made clear that these are not the same thing - currently the logic uses them interchangeably (leading to the confusing fix in Make frames_per_file calculation in create_file respect outer chunk dimension #361)
Currently the main view of progress of the hdf plugin is the FramesWritten. This is not the most transparent view for the user, because they don't care how the data is batched as it moves through the plugin chain, they care about the number of elements of data because this is will generally map onto the points in the scan. For 2D data the element count is 1, so this is fine. For batched data, this is confusing and leads to the situation where the FramesWritten at the end of an acquisition does not match the size of the dataset and conversely, the size of the dataset the user expects is not the number they should enter for FrameCount
If this refactor also helps in the case where data is split geographically and then double counted, that would be good too. Currently e.g. for xspress where the channels are split across multiple writers, the FramesWritten at the end of the scan is N x the user defined FrameCount.
Possibly we need a datapoints per frame to complete the set of variables that define what final dataset dimensions should be expected

The text was updated successfully, but these errors were encountered:

LuisFSegalla · 2024-11-04T14:52:25Z

Just adding a few thoughts based on what I found while trying to understand the issue here:

If setting the chunk size to 1 (not using block mode) and defining Frames per Block and Blocks per File to non zero values what happens is that every frame that's received by the plugin chain coming from the FP adapter will then be written to file. When the file reaches Frames per Block * Blocks per File frames it will close and another will will be opened. The process will continue until the total number of frames configured is reached and the acquisition is finished.

When using a chunk size grater than one the concept of what a frame is "changes". What will happen is that as frames arrive to the adapter they'll be batched together until the a number of frames equals to the chunk size is captured. This batch of frames is then pushed forward into the plugin chain. The problem seem to be that the whole batch of frames acquired is processed by a single process_frame call. If we still want to apply the concepts of Frames per Block and Blocks per File it's necessary to take into consideration that the plugin chain (more specifically in this case the HDF plugin) will only see **Total Number of Frames ** / Chunks size arriving to it, even though each process_frame call will contain a frame that's composed of Chunks total frames internally. It's necessary then to set the Frames per Block field accordingly as that's used as a base to create the datasets and can cause the problem seen in the issue pointed earlier.

I would suggest some small changes to odin-data, mostly on the Acquisition class to update the process_frame method in order to have it take into consideration the chunk size correctly.

GDYendell mentioned this issue Oct 17, 2024

Make frames_per_file calculation in create_file respect outer chunk dimension #361

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Clarify Frame API #365

Clarify Frame API #365

GDYendell commented Oct 17, 2024 •

edited

Loading

LuisFSegalla commented Nov 4, 2024

Clarify Frame API #365

Clarify Frame API #365

Comments

GDYendell commented Oct 17, 2024 • edited Loading

LuisFSegalla commented Nov 4, 2024

GDYendell commented Oct 17, 2024 •

edited

Loading