Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Speed up calculation of the QualityReport #718

Closed
frances-h opened this issue Feb 10, 2025 · 0 comments · Fixed by #723
Closed

Speed up calculation of the QualityReport #718

frances-h opened this issue Feb 10, 2025 · 0 comments · Fixed by #723
Assignees
Labels
feature:reports Related to any of the generated reports feature request Request for a new feature
Milestone

Comments

@frances-h
Copy link
Contributor

Problem Description

Currently, calculating the QualityReport can take a long time under certain situations because the ContingencySimilarityMetric computes the entire contingency table for the real and synthetic data. Issue #716 will add the ability to subsample in the metric, which we should utilize when running the QualityReport.

Expected behavior

Once Issue #716 has been merged in, we should update the ColumnPairTrends property to use subsampling when computing the ContingencySimilarity metric. Since both the single-table and the multi-table reports use this same property, we should only need to update it here once to affect both reports.

Changes to Implement
In the ColumnsPairTrends property, the _get_columns_and_metric method should now also return a kwarg dict. By default, the kwarg dict should be an empty dict. If the selected metric is the ContingencySimilarityMetric and the data contains over 50,000 rows, the kwarg dict should instead be {'num_rows_subsample': 50_000}.

Additionally, the _generate_details method should be updated to pass the kwarg dict returned from _get_columns_and_metric to the metric's compute_breakdown method.

Testing
We should test both the single- and multi-table quality reports use the subsampling version of the metric when applicable.

@frances-h frances-h added feature request Request for a new feature feature:reports Related to any of the generated reports labels Feb 10, 2025
@R-Palazzo R-Palazzo self-assigned this Feb 12, 2025
@R-Palazzo R-Palazzo added this to the 0.19.0 milestone Feb 12, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature:reports Related to any of the generated reports feature request Request for a new feature
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants