-
-
Notifications
You must be signed in to change notification settings - Fork 535
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Truncate MCVs #8041
Truncate MCVs #8041
Conversation
#benchmark |
@max-hoffman DOLT
|
@max-hoffman DOLT
|
@coffeegoddd DOLT
|
@max-hoffman DOLT
|
@max-hoffman DOLT
|
@max-hoffman DOLT
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A bunch of tests now return no mcv information. I'd make sure that we're not losing any test coverage as a result. Otherwise, this looks good.
start := m.Len() | ||
for i, v := range *m { | ||
if float64(v.cnt) >= cutoff { | ||
start = i |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should break out of the loop here, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
follow up bug fixes here #8049
BoundVal: sql.Row{"i", int64(1)}, | ||
BoundCnt: 2, | ||
}}, | ||
}, | ||
{ | ||
name: "mcvs", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Lets also add a test with multiple mcvs.
|
||
# setting variables doesn't hang or error | ||
dolt sql -q "SET @@persist.dolt_stats_auto_refresh_enabled = 1;" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why were these deleted?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
replaced with the analyze below, we already test background loading figured this was overkill
Sort and truncate MCVs. Only keep values whose frequency is > twice the uniform frequency. This prevents us from manually summing non-outliers (which is expensive).