Provide smart defaults for generalization. #281

cristianberneanu · 2021-12-02T13:22:58Z

When a high-cardinality column is selected, the resulting output will consist of mostly suppressed buckets. It will also usually take a long time to compute. This makes for a poor user experience.

I think it would be worthwhile to get some estimations from the file preview rows and compute smart defaults for generalization from those. What currently comes to mind:

If the column cardinality is higher than the entity count divided by the average suppression threshold, we provide a default that generalizes it to an order of magnitude smaller space. In the case of text columns, we get the average length and we subtract 1.
In the case of numeric columns, we get the the average difference between two values and we round it to the next power of 10.

cristianberneanu added the enhancement New feature or request label Dec 2, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Provide smart defaults for generalization. #281

Provide smart defaults for generalization. #281

cristianberneanu commented Dec 2, 2021

Provide smart defaults for generalization. #281

Provide smart defaults for generalization. #281

Comments

cristianberneanu commented Dec 2, 2021