Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Clarify Frame API #365

Open
GDYendell opened this issue Oct 17, 2024 · 1 comment
Open

Clarify Frame API #365

GDYendell opened this issue Oct 17, 2024 · 1 comment

Comments

@GDYendell
Copy link
Collaborator

GDYendell commented Oct 17, 2024

From #360 and #361 it emerged that a lot of the naming and API of frames, chunks, images, etc. is very unclear. This is a list of things I think could be updated to provide better definitions to make the code clearer. All proposed names are up for discussion.

  • Remove usage of image and chunk from Frame - Frame should be as generic to the data it stores as possible, it primarily defines the structure storing some data that is passed between plugins
  • Fundamentally, Frame should have a pointer to some data, and its size. To understand the data, we also need to know e.g. the datatype, shape, the compression
  • Update Frame to have a concept of "data element shape" and a "count" of those elements to allow batching
  • It should be possible for downstream plugins to treat one Frame with 10 elements of shape 100,100 the same as 10 Frames with one element of shape 100,100. If upstream plugins need to batch data for efficiency, it should be possible for downstream plugins to get a view onto the data of one element at a time. Currently there is not enough information to do this and the entire data from the Frame must be handled at once.
  • Frame.data_element_shape should map onto the inner N dimensions of the hdf file
  • Conceptually, Frame.count would generally match the outer chunk dimension in the hdf file for performance reasons, but it does not have to, for example if the reader application wants to slice the data differently that it comes out of the detector. It should be made clear that these are not the same thing - currently the logic uses them interchangeably (leading to the confusing fix in Make frames_per_file calculation in create_file respect outer chunk dimension #361)
  • Currently the main view of progress of the hdf plugin is the FramesWritten. This is not the most transparent view for the user, because they don't care how the data is batched as it moves through the plugin chain, they care about the number of elements of data because this is will generally map onto the points in the scan. For 2D data the element count is 1, so this is fine. For batched data, this is confusing and leads to the situation where the FramesWritten at the end of an acquisition does not match the size of the dataset and conversely, the size of the dataset the user expects is not the number they should enter for FrameCount
  • If this refactor also helps in the case where data is split geographically and then double counted, that would be good too. Currently e.g. for xspress where the channels are split across multiple writers, the FramesWritten at the end of the scan is N x the user defined FrameCount.
  • Possibly we need a datapoints per frame to complete the set of variables that define what final dataset dimensions should be expected
@LuisFSegalla
Copy link

Just adding a few thoughts based on what I found while trying to understand the issue here:

If setting the chunk size to 1 (not using block mode) and defining Frames per Block and Blocks per File to non zero values what happens is that every frame that's received by the plugin chain coming from the FP adapter will then be written to file. When the file reaches Frames per Block * Blocks per File frames it will close and another will will be opened. The process will continue until the total number of frames configured is reached and the acquisition is finished.

When using a chunk size grater than one the concept of what a frame is "changes". What will happen is that as frames arrive to the adapter they'll be batched together until the a number of frames equals to the chunk size is captured. This batch of frames is then pushed forward into the plugin chain. The problem seem to be that the whole batch of frames acquired is processed by a single process_frame call. If we still want to apply the concepts of Frames per Block and Blocks per File it's necessary to take into consideration that the plugin chain (more specifically in this case the HDF plugin) will only see **Total Number of Frames ** / Chunks size arriving to it, even though each process_frame call will contain a frame that's composed of Chunks total frames internally. It's necessary then to set the Frames per Block field accordingly as that's used as a base to create the datasets and can cause the problem seen in the issue pointed earlier.

I would suggest some small changes to odin-data, mostly on the Acquisition class to update the process_frame method in order to have it take into consideration the chunk size correctly.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants