Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Provide smart defaults for generalization. #281

Open
cristianberneanu opened this issue Dec 2, 2021 · 0 comments
Open

Provide smart defaults for generalization. #281

cristianberneanu opened this issue Dec 2, 2021 · 0 comments
Labels
enhancement New feature or request

Comments

@cristianberneanu
Copy link
Contributor

When a high-cardinality column is selected, the resulting output will consist of mostly suppressed buckets. It will also usually take a long time to compute. This makes for a poor user experience.

I think it would be worthwhile to get some estimations from the file preview rows and compute smart defaults for generalization from those. What currently comes to mind:

If the column cardinality is higher than the entity count divided by the average suppression threshold, we provide a default that generalizes it to an order of magnitude smaller space. In the case of text columns, we get the average length and we subtract 1.
In the case of numeric columns, we get the the average difference between two values and we round it to the next power of 10.

@cristianberneanu cristianberneanu added the enhancement New feature or request label Dec 2, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

1 participant