Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[statspro] Bootstrap database statistics once on startup #8036

Merged
merged 12 commits into from
Jun 24, 2024

Conversation

max-hoffman
Copy link
Contributor

@max-hoffman max-hoffman commented Jun 18, 2024

Load database statistics once on sql engine startup. If auto refresh is enabled, bootstrap is not performed. Behavior is on by default and can be turned off:

    dolt sql -q "set @@PERSIST.dolt_stats_bootstrap_enabled = 1;"

(calling the command above with non-empty tables will still bootstrap statistics once)

This includes a small change to the way we encode column types for stats. We previously split using a comma",", but enums and others can include commas so we use a line break now "/n". Old versions of stats will fail to load with the newer version.

@coffeegoddd
Copy link
Contributor

@max-hoffman DOLT

comparing_percentages
100.000000 to 100.000000
version result total
a362bdc ok 5937457
version total_tests
a362bdc 5937457
correctness_percentage
100.0

@max-hoffman max-hoffman force-pushed the max/stats-bootstrap branch from 279044b to 2fb4d54 Compare June 20, 2024 16:05
@coffeegoddd
Copy link
Contributor

@max-hoffman DOLT

comparing_percentages
100.000000 to 100.000000
version result total
2fb4d54 ok 5937457
version total_tests
2fb4d54 5937457
correctness_percentage
100.0

@coffeegoddd
Copy link
Contributor

@coffeegoddd DOLT

comparing_percentages
100.000000 to 100.000000
version result total
b9910d3 ok 5937457
version total_tests
b9910d3 5937457
correctness_percentage
100.0

@coffeegoddd
Copy link
Contributor

@max-hoffman DOLT

comparing_percentages
100.000000 to 100.000000
version result total
2849138 ok 5937457
version total_tests
2849138 5937457
correctness_percentage
100.0

@coffeegoddd
Copy link
Contributor

@max-hoffman DOLT

comparing_percentages
100.000000 to 100.000000
version result total
4cde46a ok 5937457
version total_tests
4cde46a 5937457
correctness_percentage
100.0

@coffeegoddd
Copy link
Contributor

@max-hoffman DOLT

comparing_percentages
100.000000 to 100.000000
version result total
2a0f836 ok 5937457
version total_tests
2a0f836 5937457
correctness_percentage
100.0

@max-hoffman max-hoffman requested a review from zachmu June 21, 2024 17:07
Copy link
Member

@zachmu zachmu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems like a good feature but I'm not sure about the default. This will incur a large startup cost for bigger DBs.

@@ -134,6 +168,9 @@ func (p *Provider) RefreshTableStats(ctx *sql.Context, table sql.Table, db strin
// branchQualifiedDatabase returns a branch qualified database. If the database
// is already branch suffixed no duplication is applied.
func (p *Provider) branchQualifiedDatabase(db, branch string) string {
if branch == "" {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably better to do this at the call site

@max-hoffman
Copy link
Contributor Author

I benchmarked the startup cost for this, and it seems like a similar penalty to rebuilding a journal index.

Testing on a 50 million row database, startup without stats is 1.5 minutes, with stats after bootstrapping is about 3 minutes, first bootstrap is 20 minutes.

@coffeegoddd
Copy link
Contributor

@max-hoffman DOLT

comparing_percentages
100.000000 to 100.000000
version result total
298b5f9 ok 5937457
version total_tests
298b5f9 5937457
correctness_percentage
100.0

@coffeegoddd
Copy link
Contributor

@max-hoffman DOLT

comparing_percentages
100.000000 to 100.000000
version result total
486897d ok 5937457
version total_tests
486897d 5937457
correctness_percentage
100.0

@max-hoffman max-hoffman merged commit c5c05b7 into main Jun 24, 2024
21 checks passed
@max-hoffman max-hoffman deleted the max/stats-bootstrap branch June 24, 2024 20:41
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants