Speed up calculation of the QualityReport #718

frances-h · 2025-02-10T14:59:56Z

Problem Description

Currently, calculating the QualityReport can take a long time under certain situations because the ContingencySimilarityMetric computes the entire contingency table for the real and synthetic data. Issue #716 will add the ability to subsample in the metric, which we should utilize when running the QualityReport.

Expected behavior

Once Issue #716 has been merged in, we should update the ColumnPairTrends property to use subsampling when computing the ContingencySimilarity metric. Since both the single-table and the multi-table reports use this same property, we should only need to update it here once to affect both reports.

Changes to Implement
In the ColumnsPairTrends property, the _get_columns_and_metric method should now also return a kwarg dict. By default, the kwarg dict should be an empty dict. If the selected metric is the ContingencySimilarityMetric and the data contains over 50,000 rows, the kwarg dict should instead be {'num_rows_subsample': 50_000}.

Additionally, the _generate_details method should be updated to pass the kwarg dict returned from _get_columns_and_metric to the metric's compute_breakdown method.

Testing
We should test both the single- and multi-table quality reports use the subsampling version of the metric when applicable.

The text was updated successfully, but these errors were encountered:

frances-h added feature request Request for a new feature feature:reports Related to any of the generated reports labels Feb 10, 2025

R-Palazzo self-assigned this Feb 12, 2025

R-Palazzo added this to the 0.19.0 milestone Feb 12, 2025

R-Palazzo mentioned this issue Feb 12, 2025

Speed up calculation of the QualityReport #723

Merged

R-Palazzo closed this as completed in #723 Feb 13, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Speed up calculation of the QualityReport #718

Speed up calculation of the QualityReport #718

frances-h commented Feb 10, 2025

Speed up calculation of the QualityReport #718

Speed up calculation of the QualityReport #718

Comments

frances-h commented Feb 10, 2025

Problem Description

Expected behavior