-
Notifications
You must be signed in to change notification settings - Fork 8
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add a way to pull all available rows #122
Comments
I was just playing with the API, and I think this would be almost straightforward. We would set a condition that, if |
Couldn't we use the number of pages and limit of the first response of the last series of api calls to estimate? |
I wouldn't leave out the possibility of giving incremental messages, but we wouldn't be able to say up front |
Gotcha, I see what you're saying now. I think there will be a big difference between the total number of dimension values (which we'll use to calculate the estimate) and the max number of dimension values for a given combination of dimension levels. E.g., there might be 500,000 page paths, but only 1 page path for a given combination of dimension levels in your breakdown |
I want to distinguish between stateful elements in the queries and stateless. Stateless elements are the same in every sub-query; stateful elements change depending on which sub-query is being performed. The "query spec" contains all information about the global query. It's essentially a serialization of the arguments to `aw_freeform_table`, with certain guarantees about the contents. For example, `limit`, `page`, `dimensions`, and `sort` are always all the same length. With a set of getter functions, you don't need to know exactly how the data structure is built. Also, this is more resilient to changes over time. It's a little clunky, but it's also safe. The getter functions are all prefixed with `qs` for query spec. This is the first step towards Issue benrwoodard#122.
The limit is 50k rows from the API. The goal of this enhancement would be to add an "all" value as an option in the 'top' argument so that all the results will be returned. A possible solution is to add an option similar to what we did for dateranges using "0" to pull all dates or hours. If a user adds "all" as the last "top" argument value the last API call will pull the number of pages needed to pull all the rows and then loop through the pages compiling the final dataset.
It may also work to simply have an "all" argument set to TRUE or FALSE. Then the last API call would loop through the pages.
Theoretically, this is only viable if it is the last call since there would be no way to pull 50k+ rows and then do additional API calls on that.
The text was updated successfully, but these errors were encountered: