Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Replace predicate pushdown in the Faker connector with column properties #24323

Merged
merged 4 commits into from
Dec 26, 2024

Conversation

nineinchnick
Copy link
Member

@nineinchnick nineinchnick commented Dec 1, 2024

Description

Predicate pushdown in the Faker connector violates the SQL semantics,
because when applied to separate columns, correlation between columns is
not preserved, and returned results are not deterministic. The min,
max, and allowed_values column properties should be used instead.

Fixes #24147

Additional context and related issues

Release notes

( ) This is not user-visible or is docs only, and no release notes are required.
( ) Release notes are required. Please propose a release note for me.
(x) Release notes are required, with the following suggested text:

## Faker
* Replace predicate pushdown with `min`, `max`, and `options` column properties. ({issue}`24147`)

import static java.util.Objects.requireNonNull;

public record FakerColumnHandle(
int columnIndex,
String name,
Type type,
double nullProbability,
String generator)
String generator,
String min,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you use Optionals please

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why? We handle all cases when the value is null.

Copy link
Member

@losipiuk losipiuk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

some comments. looks nice

Copy link

This pull request has gone a while without any activity. Tagging for triage help: @mosabua

@github-actions github-actions bot added the stale label Dec 23, 2024
@nineinchnick
Copy link
Member Author

I'm still working on this.

@github-actions github-actions bot removed the stale label Dec 24, 2024
@nineinchnick nineinchnick force-pushed the faker-range-properties branch 2 times, most recently from f40d354 to b2f98a5 Compare December 26, 2024 13:52
@nineinchnick nineinchnick force-pushed the faker-range-properties branch 2 times, most recently from 90ee9b7 to 67d40c5 Compare December 26, 2024 16:37
@nineinchnick nineinchnick force-pushed the faker-range-properties branch from 67d40c5 to ca22d61 Compare December 26, 2024 16:43
Allow constraining generated values by setting the min, max, or
allowed_values column properties.
Predicate pushdown in the Faker connector violates the SQL semantics,
because when applied to separate columns, correlation between columns is
not preserved, and returned results are not deterministic. The `min`,
`max`, and `options` column properties should be used instead.
@nineinchnick nineinchnick force-pushed the faker-range-properties branch from ca22d61 to 2e7ea07 Compare December 26, 2024 17:11
@nineinchnick nineinchnick force-pushed the faker-range-properties branch from 2e7ea07 to 383869e Compare December 26, 2024 17:13
@raunaqmorarka raunaqmorarka merged commit 43e5ead into trinodb:master Dec 26, 2024
20 checks passed
@github-actions github-actions bot added this to the 469 milestone Dec 26, 2024
@nineinchnick nineinchnick deleted the faker-range-properties branch December 26, 2024 17:49
@@ -166,7 +175,7 @@ Faker supports the following non-character types:
- `UUID`

You can not use generator expressions for non-character-based columns. To limit
their data range, specify constraints in the `WHERE` clause - see
their data range, set the `min` and/or `max` column properties - see
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just use "and" ..

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll fix this in #24585, I haven't updated the docs there yet.

@@ -102,6 +102,15 @@ The following table details all supported column properties.
sentence from the
[Lorem](https://javadoc.io/doc/net.datafaker/datafaker/latest/net/datafaker/providers/base/Lorem.html)
provider.
* - `min`
- Minimum generated value (inclusive). Cannot be set for character-based type
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So does it work for numeric and also date types?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Development

Successfully merging this pull request may close these issues.

Faker connector produces incorrect results
4 participants