-
Notifications
You must be signed in to change notification settings - Fork 915
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Hash Join of a hypertable with an index over an integer field will less than 200 join keys is suboptimal #2426
Comments
Hi! I guess it's quite a generic issue of joins with hypertables, joins tend to go hash, this leading to no partiton pruning or mass uncompress in case it is compressed.
If it's just a table in next query we get nested loop with index access:
If we make it hypertable - we get inefficient hash join. Lateral and group by or limit help to avoid hash join
But if we compress
both these queries with Nested loops will lead to all chunks decompression.
But on big tables and parallel queries we noticed it performs much slower(x10 at least) than if we use function with two separate queries. UPD. Checked same with simple partitioned(with partman) postgress table - for all join queries we get Nested Loop with runtime exclusion:
Not sure if bad performance(for scalar query where it works) it's a runtime exclusion problem with compressed chunks or generic runtime exclusion problem - I do not have big and loaded simple partitioned table. But plans without this exclusion or even with hash join are clearly issue connected with Timescale. |
This is extracted from timescale/pg_prometheus#55, but is not specific to pg_prometheus.
Schema
Our data has about 250 unique series, all with 1 data point per second, i. e. about 250 data points per second in total.
Problem - Hash Join when selecting just 190 indexed series_id is suboptimal
Query planner decided to do a Hash Join with just 192 rows from the
series
table, which doesn't make sense.When I replace the nested
SELECT
query withseries_id IN (21,22,...212)
, i. e. literally all the selected values, Timescale does the same (I don't post the query and the result because the query string is very long).But I can make Timescale to do an Index Scan instead of a Hash Join by using a
series_id between 21 and 212
clause. This is 40% faster:The text was updated successfully, but these errors were encountered: