DRS paging #325

briandoconnor · 2020-07-13T17:54:22Z

Ideas:
Can we include...
has_more
next (URL to the next page)
previous (URL to the previous page)
page_count: number of pages (can we make this optional?)
items_per_page: number of bundle items per page
These would be present in every response.
Also consider adding a requested page size

dG: see also https://cloud.google.com/apis/design/design_patterns#list_pagination for a slightly simpler pagination API style guide.

Next steps
Common approach, see link above
check with TASC Force -- Is GA4GH doing this in a consistent way? Is there a common pattern we should use across all our APIs?
Who will implement this? U. Chicago? Other groups? We need at least one driver.
U. Chicago implementer
CRDC, GDC/IDC
TOPMed
EMBL imaging site?

ghost · 2020-07-13T17:57:35Z

A provision for an index is something that would be generally useful. A somewhat flexible structure that mapped a key to a position in the bundle. I imagine for large bundles it is inefficient to jump from page to page looking for something.

pgrosu · 2020-07-13T19:30:40Z

Besides the offset-with-limit approach Kaushik mentioned, you probably want a sort feature so you can perform a seek-filter (i.e. retrieve after some id) with a limit.

ianfore · 2021-03-22T20:49:31Z

Discussed within CRDC Imaging and Data Commons framework. Determined that 'standard' pagination capability is required.
'Standard' pagination capability would amount to what was suggested at the top of the issue. Copied and annotated below.
However, rather than specifying all those from scratch suggest that:

There are a number of standard approaches already in use, both in and out of GA4GH. Suggest that adopting one of these rather than coming up with a new one
A common approach to pagination would be desirable across GA4GH APIs unless there is really different capability required.
If the previous will take a long time to resolve, a DRS approach should be taken as there are immediate needs.
has_more
or
page n of page_count
page_count: number of pages (can we make this optional?)
what does its absence mean? If it means there is only one page, is it not easier to require page_count and set the value to 1 if that is the case.
Navigation
next (URL to the next page
or to a specific page
previous (URL to the previous page) - seems less of a case for this
items_per_page: number of bundle items per page
Allow page size to be part of the request

bcli4d · 2021-04-07T20:47:11Z

In my opinion, the Google design pattern which David references above is more than adequate for expected Imaging Data Commons uses. Specifically, I don't see any value in being able to request an arbitrary page, a capability which the Google design pattern appears to support. I think, for our purposes, it would be adequate to just get the next page until all data has been received.
I'm trying to imagine a scenario in which random page access could be useful. Ian talked about viewing pyramidal, tiled images. It seems that this would require that the client knows the server access pattern, e.g. how it walks a bundle hierarchy ( a bundle of bundles) and in what order results are then returned. Guessing specifying this might be difficult at best for some implementations.

BinamB · 2021-04-08T19:24:13Z

I fully agree with what @bcli4d mentioned. I believe we should keep bundles as simple as possible. In an expanded bundle with many file objects, servers need to respond with large payloads quickly and clients will need to be able to receive and parse large response objects. Implementing the Google design pattern solves for this and enables clients to recursively expand large bundles by just iterating through each page.

When bundles are nested and the expand=True flag is called pagination calculations will change and we will need to discuss how the client will handle this.

pgrosu · 2021-04-08T20:46:07Z

Just curious, what would be the upper limit of the number of pages and page sizes? Imagine if you will this scenario :) You got your DICOM images, but they are too big so you create patches out of them, which become a multiple of the original set. Upon these patches you can perform searches. So you make that another repository of patched datasets, in addition to your original one. Then you decide that you want to perform pairwise or some other segmentation analysis of any incoming patches. So you build a generative model using a simple variational autoencoder (VAE). Sure it helps you with image synthesis, but it can do much more than that, as it can find similarities quickly among any of the above datasets or new incoming data with refinement filters. This alternate representation of the data now becomes another addition to the query engine. In fact can speed up searches without much computational overhead under specific query design criteria. Since this could be helpful to others as well, you keep adding the derived data to your Cloud repositories/databases. Since some queries are based on generative models, the query results could become some power of the original set. So we started with whole images, then we went to patched images, and then to VAE representation of images. All of these speed up the ability for clinicians/researchers to get interesting insight, while still remaining computationally manageable given the above data preparation.

So I come back to my original question, what might be the upper limit of the number of query result pages and page sizes? Would the upper limit of the number of pages be 5000, 10000, 100k, 500k or something more? At some point you might spend more time parsing through search results, rather than gathering useful scientific insight.

Hope it helps,
Paul

ianfore · 2021-04-12T15:50:03Z

In getting back into the discussion I'll focus on is Binam's "keep bundles as simple as possible". To that end, and focussing on pagination - as that is the subject of this ticket.

A bundle is an unstructured collection of DRS responses in response to a request consisting of multiple DRS ids
-- This is the need expressed in Improve support for containers that contain *lots* of Objects #286, DRS bulk requests #334
Pagination is correspondingly simple and just needs to support how you page through the results of such a request
Bundle expansion should be removed from the spec (makes pagination simpler)

The discussions of other things here aren't about pagination - there are other tickets for those.
Paul - there's a lot in your comment which we could discuss in those other tickets. I'm working up a DICOM using existing DRS ids from the IDC which Bill and Binam work on. But will comment on one of your points here.
"You got your DICOM images, but they are too big so you create patches out of them". Creating patches out of them amounts to the multiple request referred to above - i.e. you make a request consisting of multiple DRS ids that you decide. That patching/slicing/bundling is not predetermined, you - the client author - decide it according to the need of your application and request the 'patch' you want.

briandoconnor self-assigned this Jul 13, 2020

briandoconnor added Project: DRS Priority: High labels Jul 13, 2020

ianfore added the Function:Bundles Related to Bundle functionality label Jul 29, 2020

This was referenced Jan 25, 2021

Improve support for containers that contain *lots* of Objects #286

Open

Overall DRS scaling concerns [EPIC] #342

Open

briandoconnor unassigned briandoconnor Jan 25, 2021

briandoconnor added the 2021-ga4gh-connect label Mar 4, 2021

This was referenced Sep 14, 2021

contents array pagination of DRS Object bundle #366

Open

DRS bundle contents pagination #367

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

DRS paging #325

DRS paging #325

briandoconnor commented Jul 13, 2020 •

edited by unito-bot

Loading

ghost commented Jul 13, 2020

pgrosu commented Jul 13, 2020

ianfore commented Mar 22, 2021

bcli4d commented Apr 7, 2021

BinamB commented Apr 8, 2021

pgrosu commented Apr 8, 2021

ianfore commented Apr 12, 2021 •

edited

Loading

DRS paging #325

DRS paging #325

Comments

briandoconnor commented Jul 13, 2020 • edited by unito-bot Loading

ghost commented Jul 13, 2020

pgrosu commented Jul 13, 2020

ianfore commented Mar 22, 2021

bcli4d commented Apr 7, 2021

BinamB commented Apr 8, 2021

pgrosu commented Apr 8, 2021

ianfore commented Apr 12, 2021 • edited Loading

briandoconnor commented Jul 13, 2020 •

edited by unito-bot

Loading

ianfore commented Apr 12, 2021 •

edited

Loading