You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
When a high-cardinality column is selected, the resulting output will consist of mostly suppressed buckets. It will also usually take a long time to compute. This makes for a poor user experience.
I think it would be worthwhile to get some estimations from the file preview rows and compute smart defaults for generalization from those. What currently comes to mind:
If the column cardinality is higher than the entity count divided by the average suppression threshold, we provide a default that generalizes it to an order of magnitude smaller space. In the case of text columns, we get the average length and we subtract 1.
In the case of numeric columns, we get the the average difference between two values and we round it to the next power of 10.
The text was updated successfully, but these errors were encountered:
When a high-cardinality column is selected, the resulting output will consist of mostly suppressed buckets. It will also usually take a long time to compute. This makes for a poor user experience.
I think it would be worthwhile to get some estimations from the file preview rows and compute smart defaults for generalization from those. What currently comes to mind:
If the column cardinality is higher than the entity count divided by the average suppression threshold, we provide a default that generalizes it to an order of magnitude smaller space. In the case of text columns, we get the average length and we subtract 1.
In the case of numeric columns, we get the the average difference between two values and we round it to the next power of 10.
The text was updated successfully, but these errors were encountered: