Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix join on arrays of unhashable types and allow hash join on all types supported at run-time #13388

Merged
merged 4 commits into from
Nov 19, 2024

Conversation

findepi
Copy link
Member

@findepi findepi commented Nov 13, 2024

Which issue does this PR close?

None

Rationale for this change

fix potential query failure due to incorrect can_hash result

What changes are included in this PR?

  • update can_hash
  • cleanup compute_hashes

Are these changes tested?

Yes, unit tests.
query-based tests are hard to write -- it is unclear how to construct Union or RunEndEncoded types in SQL

Are there any user-facing changes?

yes

The `downcast_primitive_array!` macro handles all primitive types
and only then delegates to fallbacks. It handles Decimal128 and
Decimal256 internally.
@github-actions github-actions bot added logical-expr Logical plan and expressions common Related to common crate labels Nov 13, 2024
@findepi findepi force-pushed the findepi/join-hashes branch from 43b5fac to 4961ca6 Compare November 13, 2024 11:12
@alamb
Copy link
Contributor

alamb commented Nov 13, 2024

query-based tests are hard to write -- it is unclear how to construct Union or RunEndEncoded types in SQL

One way to do so would be via arrow_cast

However, those types don't seem to be supported via arrow_cast yet:

> select arrow_cast(1, 'Union(Union))');
Error during planning: Unsupported type 'Union(Union))'. Must be a supported arrow type name such as 'Int32' or 'Timestamp(Nanosecond, None)'. Error unrecognized word: Union

@alamb
Copy link
Contributor

alamb commented Nov 13, 2024

The other way to do it is to programatically setup the sqllogictest, as is done for map.slt (which didn't have any way to create MapArrays when it was written):

@alamb alamb marked this pull request as draft November 15, 2024 18:45
@alamb
Copy link
Contributor

alamb commented Nov 15, 2024

Converting to draft as I think it is waiting on some tests. Please let me know if that isn't right and this is ready for another review

@findepi findepi changed the title Fix join on arrays of unhashable types Fix join on arrays of unhashable types and allow hash join on all types supported at run-time Nov 16, 2024
@findepi findepi force-pushed the findepi/join-hashes branch from 4961ca6 to d1fa0eb Compare November 16, 2024 17:17
@findepi
Copy link
Member Author

findepi commented Nov 16, 2024

This change has two aspects

  1. fixing hash joins on types that are not hashable, but can_hash considered them hashable. This is eg List of something unhashable. I couldn't write a test for this, since types such as listview or ree hit other problems.
  2. unblocking hash joins on types that are actually hashable, but can_hash considered them not hashable. This is not as edge-case'y, for example binary is hashable, but we didn't use hash joins when joining on binary. I added a plan test for this.

@findepi findepi marked this pull request as ready for review November 16, 2024 17:17
@github-actions github-actions bot added the sqllogictest SQL Logic Tests (.slt) label Nov 16, 2024
@findepi
Copy link
Member Author

findepi commented Nov 17, 2024

@alamb please take another look

Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @findepi -- I think this is a nice improvement and cleanup

@@ -251,8 +253,11 @@ pub async fn register_table_with_many_types(ctx: &SessionContext) {
.unwrap();
ctx.register_catalog("my_catalog", Arc::new(catalog));

ctx.register_table("my_catalog.my_schema.t2", table_with_many_types())
.unwrap();
ctx.register_table(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

@alamb alamb merged commit 9fb5ff9 into apache:main Nov 19, 2024
25 checks passed
@findepi findepi deleted the findepi/join-hashes branch November 20, 2024 06:19
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
common Related to common crate logical-expr Logical plan and expressions sqllogictest SQL Logic Tests (.slt)
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants