-
Notifications
You must be signed in to change notification settings - Fork 53
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
DRS paging #325
Comments
A provision for an index is something that would be generally useful. A somewhat flexible structure that mapped a key to a position in the bundle. I imagine for large bundles it is inefficient to jump from page to page looking for something. |
Besides the offset-with-limit approach Kaushik mentioned, you probably want a sort feature so you can perform a seek-filter (i.e. retrieve after some id) with a limit. |
Discussed within CRDC Imaging and Data Commons framework. Determined that 'standard' pagination capability is required.
|
In my opinion, the Google design pattern which David references above is more than adequate for expected Imaging Data Commons uses. Specifically, I don't see any value in being able to request an arbitrary page, a capability which the Google design pattern appears to support. I think, for our purposes, it would be adequate to just get the next page until all data has been received. |
I fully agree with what @bcli4d mentioned. I believe we should keep bundles as simple as possible. In an expanded bundle with many file objects, servers need to respond with large payloads quickly and clients will need to be able to receive and parse large response objects. Implementing the Google design pattern solves for this and enables clients to recursively expand large bundles by just iterating through each page. When bundles are nested and the expand=True flag is called pagination calculations will change and we will need to discuss how the client will handle this. |
Just curious, what would be the upper limit of the number of pages and page sizes? Imagine if you will this scenario :) You got your DICOM images, but they are too big so you create patches out of them, which become a multiple of the original set. Upon these patches you can perform searches. So you make that another repository of patched datasets, in addition to your original one. Then you decide that you want to perform pairwise or some other segmentation analysis of any incoming patches. So you build a generative model using a simple variational autoencoder (VAE). Sure it helps you with image synthesis, but it can do much more than that, as it can find similarities quickly among any of the above datasets or new incoming data with refinement filters. This alternate representation of the data now becomes another addition to the query engine. In fact can speed up searches without much computational overhead under specific query design criteria. Since this could be helpful to others as well, you keep adding the derived data to your Cloud repositories/databases. Since some queries are based on generative models, the query results could become some power of the original set. So we started with whole images, then we went to patched images, and then to VAE representation of images. All of these speed up the ability for clinicians/researchers to get interesting insight, while still remaining computationally manageable given the above data preparation. So I come back to my original question, what might be the upper limit of the number of query result pages and page sizes? Would the upper limit of the number of pages be 5000, 10000, 100k, 500k or something more? At some point you might spend more time parsing through search results, rather than gathering useful scientific insight. Hope it helps, |
In getting back into the discussion I'll focus on is Binam's "keep bundles as simple as possible". To that end, and focussing on pagination - as that is the subject of this ticket.
The discussions of other things here aren't about pagination - there are other tickets for those. |
Ideas:
Can we include...
has_more
next (URL to the next page)
previous (URL to the previous page)
page_count: number of pages (can we make this optional?)
items_per_page: number of bundle items per page
These would be present in every response.
Also consider adding a requested page size
dG: see also https://cloud.google.com/apis/design/design_patterns#list_pagination for a slightly simpler pagination API style guide.
Next steps
Common approach, see link above
check with TASC Force -- Is GA4GH doing this in a consistent way? Is there a common pattern we should use across all our APIs?
Who will implement this? U. Chicago? Other groups? We need at least one driver.
U. Chicago implementer
CRDC, GDC/IDC
TOPMed
EMBL imaging site?
The text was updated successfully, but these errors were encountered: