-
Notifications
You must be signed in to change notification settings - Fork 4
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Optimize the analysis SQL workflow #105
Comments
Related to: ooni/data#105
Related to: ooni/data#105
The join algorithm has been tweaked and the maximum allowed memory has also been increased. I think some more tuning and investigation should be done to understand if we should use a join algorithm other than A useful resource are these blog posts from clickhouse that explain the performance/memory tradeoff between the various join algorithms and settings: https://clickhouse.com/blog/clickhouse-fully-supports-joins-full-sort-partial-merge-part3#full-sorting-merge-join. That said I would say we can close this issue for the time being as the values are sufficiently good to not run into performance issues anymore. |
Currently the analysis workflow in OONI Pipeline v5 will sometimes run out of memory. This is because we are performing some very heavy join operations.
See:
As an interim fix I am going to bump up the memory limit, however we ought to eventually consider if we can do something to improve the queries so that it doesn't have such a high memory requirement.
Probably one way to do it would be to split out the JOIN queries and do them in stages.
The text was updated successfully, but these errors were encountered: