Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optimize the analysis SQL workflow #105

Closed
hellais opened this issue Jan 9, 2025 · 1 comment
Closed

Optimize the analysis SQL workflow #105

hellais opened this issue Jan 9, 2025 · 1 comment

Comments

@hellais
Copy link
Member

hellais commented Jan 9, 2025

Currently the analysis workflow in OONI Pipeline v5 will sometimes run out of memory. This is because we are performing some very heavy join operations.

See:

DB::Exception: Memory limit (for query) exceeded: would use 10.30 GiB (attempt to allocate chunk of 67108864 bytes), maximum: 10.25 GiB.: While executing JoiningTransform. Stack trace:
0. DB::Exception::Exception(DB::Exception::MessageMasked&&, int, bool) @ 0x000000000d16d15b
1. DB::Exception::Exception(PreformattedMessage&&, int) @ 0x0000000007dacdec
2. DB::Exception::Exception<char const*, char const*, String, long&, String, char const*, std::basic_string_view<char, std::char_traits<char>>>(int, FormatStringHelperImpl<std::type_identity<char const*>::type, std::type_identity<char const*>::type, std::type_identity<String>::type, std::type_identity<long&>::type, std::type_identity<String>::type, std::type_identity<char const*>::type, std::type_identity<std::basic_string_view<char, std::char_traits<char>>>::type>, char const*&&, char const*&&, String&&, long&, String&&, char const*&&, std::basic_string_view<char, std::char_traits<char>>&&) @ 0x000000000d18aca4
3. MemoryTracker::allocImpl(long, bool, MemoryTracker*, double) @ 0x000000000d189f99
4. MemoryTracker::allocImpl(long, bool, MemoryTracker*, double) @ 0x000000000d189d3d
5. Allocator<false, false>::realloc(void*, unsigned long, unsigned long, unsigned long) @ 0x000000000d13ff87
6. void DB::PODArrayBase<1ul, 4096ul, Allocator<false, false>, 63ul, 64ul>::resize<>(unsigned long) @ 0x0000000007dba626
7. DB::ColumnString::insertRangeFrom(DB::IColumn const&, unsigned long, unsigned long) @ 0x0000000011696bbe
8. DB::ColumnArray::insertFrom(DB::IColumn const&, unsigned long) @ 0x00000000115d5657

As an interim fix I am going to bump up the memory limit, however we ought to eventually consider if we can do something to improve the queries so that it doesn't have such a high memory requirement.

Probably one way to do it would be to split out the JOIN queries and do them in stages.

@hellais hellais self-assigned this Jan 9, 2025
hellais added a commit that referenced this issue Jan 9, 2025
hotfix related to: #105
hellais added a commit to ooni/devops that referenced this issue Jan 9, 2025
hellais added a commit to ooni/devops that referenced this issue Jan 9, 2025
hellais added a commit that referenced this issue Jan 24, 2025
@hellais hellais moved this to In Progress in Sprint Planning Jan 24, 2025
@hellais
Copy link
Member Author

hellais commented Jan 24, 2025

The join algorithm has been tweaked and the maximum allowed memory has also been increased.

I think some more tuning and investigation should be done to understand if we should use a join algorithm other than grace_join or if different values of grace_hash_join_initial_buckets should be used.

A useful resource are these blog posts from clickhouse that explain the performance/memory tradeoff between the various join algorithms and settings: https://clickhouse.com/blog/clickhouse-fully-supports-joins-full-sort-partial-merge-part3#full-sorting-merge-join.

That said I would say we can close this issue for the time being as the values are sufficiently good to not run into performance issues anymore.

@hellais hellais closed this as completed Jan 24, 2025
@github-project-automation github-project-automation bot moved this from In Progress to Done in Sprint Planning Jan 24, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Status: Done
Development

No branches or pull requests

1 participant