Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Define what constitutes a DataSource and a Key for Bluesky data #11

Open
padraic-shafer opened this issue Sep 8, 2024 · 1 comment
Open

Comments

@padraic-shafer
Copy link

PyMCA is rather flexible about what constitutes a DataSource, a sourceName, and a Key, as long as they can be used to retrieve the data that a users selects in the (x, y, m) table. Therefore it is up to us to define these in a way that is practical for our anticipated use cases.

I think that one helpful simplification we could make is to focus the user toward selecting data streams within a CatalogOfBlueskyRuns, rather than selecting data from arbitrary Tiled nodes. This is in keeping with the overall aim of exploring and visualizing data from one or more Bluesky runs.

It then seems natural to associate the catalog (e.g. “…/smi/raw”, “…/smi/sandbox”, etc.) with the name of the DataSource rather than the Key. The Key should certainly contain the name of the data stream (e.g., primary, baseline, dark images, etc.). There is then perhaps some ambiguity in whether each run UUID is considered a separate source (part of the name) or if instead each run is part of the Key within the same source catalog.

Having the run be part of the DataSource name rather than Key is closer to some of the existing file-based DataSources, where there is one DataSource per filename (or perhaps a list of file names supplied as the “name” of a single source). However this means a DataSource object is created for each run, which is extra overhead…although perhaps this is not significant resource drain in practice. If needed we could make these multiple data sources a lightweight __slots__-based object that delegates common functionality to another helper object.

For simplicity we should probably avoid associating the DataSource with more than one catalog — that is, not use a list of sourceNames. One reason is that when the QDispatcher fetches data for a new selection, it sends only the Key and the Selection to the DataSource object. The data sourceName or an index would need to be included with the Key to reconstruct which sourceName was active during the selection. OTOH, the list of sourceNames (and list of Keys?) used by some of the file-based DataSources might have been an optimization constructed to minimize the number of DataSource objects in memory(?).

@padraic-shafer
Copy link
Author

@AbbyGi @hyperrealist @danielballan What opinions do you have on how we should organize the name(s) and key(s) of DataSource(s).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant