Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add a way to pull all available rows #122

Open
benrwoodard opened this issue Feb 15, 2022 · 5 comments
Open

Add a way to pull all available rows #122

benrwoodard opened this issue Feb 15, 2022 · 5 comments
Assignees
Labels
enhancement New feature or request

Comments

@benrwoodard
Copy link
Owner

The limit is 50k rows from the API. The goal of this enhancement would be to add an "all" value as an option in the 'top' argument so that all the results will be returned. A possible solution is to add an option similar to what we did for dateranges using "0" to pull all dates or hours. If a user adds "all" as the last "top" argument value the last API call will pull the number of pages needed to pull all the rows and then loop through the pages compiling the final dataset.
It may also work to simply have an "all" argument set to TRUE or FALSE. Then the last API call would loop through the pages.
Theoretically, this is only viable if it is the last call since there would be no way to pull 50k+ rows and then do additional API calls on that.

@benrwoodard benrwoodard added the enhancement New feature or request label Feb 15, 2022
@charlie-gallagher
Copy link
Collaborator

I was just playing with the API, and I think this would be almost straightforward. We would set a condition that, if top is "all", then continue querying until the response contains "lastPage=true". We wouldn't be able to predict how long it would take, of course, but we could definitely do it. I'll see if I can add that logic to the query function. Hopefully everything's modularized enough that it's just changing one function a little bit

@benrwoodard
Copy link
Owner Author

Couldn't we use the number of pages and limit of the first response of the last series of api calls to estimate?

@charlie-gallagher
Copy link
Collaborator

charlie-gallagher commented Apr 22, 2022

I wouldn't leave out the possibility of giving incremental messages, but we wouldn't be able to say up front

@benrwoodard
Copy link
Owner Author

I don't think my message was clear. In the response to the API call, we have "totalElements" defined for us. So we theoretically could send up a request with a limit of 1 and get the "totalElements" then do the simple math of defining how many pages we would need given the 50k row limit and then build our final request from there, right?
image

@charlie-gallagher
Copy link
Collaborator

Gotcha, I see what you're saying now. I think there will be a big difference between the total number of dimension values (which we'll use to calculate the estimate) and the max number of dimension values for a given combination of dimension levels.

E.g., there might be 500,000 page paths, but only 1 page path for a given combination of dimension levels in your breakdown

charlie-gallagher pushed a commit to charlie-gallagher/adobeanalyticsr that referenced this issue Apr 26, 2022
I want to distinguish between stateful elements in the queries and
stateless. Stateless elements are the same in every sub-query; stateful
elements change depending on which sub-query is being performed.

The "query spec" contains all information about the global query. It's
essentially a serialization of the arguments to `aw_freeform_table`,
with certain guarantees about the contents. For example, `limit`,
`page`, `dimensions`, and `sort` are always all the same length.

With a set of getter functions, you don't need to know exactly how the
data structure is built. Also, this is more resilient to changes over
time. It's a little clunky, but it's also safe.

The getter functions are all prefixed with `qs` for query spec.

This is the first step towards Issue benrwoodard#122.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants