Add a way to pull all available rows #122

benrwoodard · 2022-02-15T22:14:46Z

The limit is 50k rows from the API. The goal of this enhancement would be to add an "all" value as an option in the 'top' argument so that all the results will be returned. A possible solution is to add an option similar to what we did for dateranges using "0" to pull all dates or hours. If a user adds "all" as the last "top" argument value the last API call will pull the number of pages needed to pull all the rows and then loop through the pages compiling the final dataset.
It may also work to simply have an "all" argument set to TRUE or FALSE. Then the last API call would loop through the pages.
Theoretically, this is only viable if it is the last call since there would be no way to pull 50k+ rows and then do additional API calls on that.

charlie-gallagher · 2022-04-21T22:20:14Z

I was just playing with the API, and I think this would be almost straightforward. We would set a condition that, if top is "all", then continue querying until the response contains "lastPage=true". We wouldn't be able to predict how long it would take, of course, but we could definitely do it. I'll see if I can add that logic to the query function. Hopefully everything's modularized enough that it's just changing one function a little bit

benrwoodard · 2022-04-21T22:29:02Z

Couldn't we use the number of pages and limit of the first response of the last series of api calls to estimate?

charlie-gallagher · 2022-04-22T12:35:38Z

I wouldn't leave out the possibility of giving incremental messages, but we wouldn't be able to say up front

benrwoodard · 2022-04-22T13:32:14Z

I don't think my message was clear. In the response to the API call, we have "totalElements" defined for us. So we theoretically could send up a request with a limit of 1 and get the "totalElements" then do the simple math of defining how many pages we would need given the 50k row limit and then build our final request from there, right?

charlie-gallagher · 2022-04-22T21:17:33Z

Gotcha, I see what you're saying now. I think there will be a big difference between the total number of dimension values (which we'll use to calculate the estimate) and the max number of dimension values for a given combination of dimension levels.

E.g., there might be 500,000 page paths, but only 1 page path for a given combination of dimension levels in your breakdown

I want to distinguish between stateful elements in the queries and stateless. Stateless elements are the same in every sub-query; stateful elements change depending on which sub-query is being performed. The "query spec" contains all information about the global query. It's essentially a serialization of the arguments to `aw_freeform_table`, with certain guarantees about the contents. For example, `limit`, `page`, `dimensions`, and `sort` are always all the same length. With a set of getter functions, you don't need to know exactly how the data structure is built. Also, this is more resilient to changes over time. It's a little clunky, but it's also safe. The getter functions are all prefixed with `qs` for query spec. This is the first step towards Issue benrwoodard#122.

benrwoodard assigned benrwoodard and charlie-gallagher Feb 15, 2022

benrwoodard added the enhancement New feature or request label Feb 15, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add a way to pull all available rows #122

Add a way to pull all available rows #122

benrwoodard commented Feb 15, 2022

charlie-gallagher commented Apr 21, 2022

benrwoodard commented Apr 21, 2022

charlie-gallagher commented Apr 22, 2022 •

edited

Loading

benrwoodard commented Apr 22, 2022

charlie-gallagher commented Apr 22, 2022

Add a way to pull all available rows #122

Add a way to pull all available rows #122

Comments

benrwoodard commented Feb 15, 2022

charlie-gallagher commented Apr 21, 2022

benrwoodard commented Apr 21, 2022

charlie-gallagher commented Apr 22, 2022 • edited Loading

benrwoodard commented Apr 22, 2022

charlie-gallagher commented Apr 22, 2022

charlie-gallagher commented Apr 22, 2022 •

edited

Loading