You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Currently, calculating the QualityReport can take a long time under certain situations because the ContingencySimilarityMetric computes the entire contingency table for the real and synthetic data. Issue #716 will add the ability to subsample in the metric, which we should utilize when running the QualityReport.
Expected behavior
Once Issue #716 has been merged in, we should update the ColumnPairTrends property to use subsampling when computing the ContingencySimilarity metric. Since both the single-table and the multi-table reports use this same property, we should only need to update it here once to affect both reports.
Changes to Implement
In the ColumnsPairTrends property, the _get_columns_and_metric method should now also return a kwarg dict. By default, the kwarg dict should be an empty dict. If the selected metric is the ContingencySimilarityMetric and the data contains over 50,000 rows, the kwarg dict should instead be {'num_rows_subsample': 50_000}.
Additionally, the _generate_details method should be updated to pass the kwarg dict returned from _get_columns_and_metric to the metric's compute_breakdown method.
Testing
We should test both the single- and multi-table quality reports use the subsampling version of the metric when applicable.
The text was updated successfully, but these errors were encountered:
Problem Description
Currently, calculating the QualityReport can take a long time under certain situations because the
ContingencySimilarityMetric
computes the entire contingency table for the real and synthetic data. Issue #716 will add the ability to subsample in the metric, which we should utilize when running the QualityReport.Expected behavior
Once Issue #716 has been merged in, we should update the
ColumnPairTrends
property to use subsampling when computing theContingencySimilarity
metric. Since both the single-table and the multi-table reports use this same property, we should only need to update it here once to affect both reports.Changes to Implement
In the
ColumnsPairTrends
property, the_get_columns_and_metric
method should now also return a kwarg dict. By default, the kwarg dict should be an empty dict. If the selected metric is theContingencySimilarityMetric
and the data contains over 50,000 rows, the kwarg dict should instead be{'num_rows_subsample': 50_000}
.Additionally, the
_generate_details
method should be updated to pass the kwarg dict returned from_get_columns_and_metric
to the metric'scompute_breakdown
method.Testing
We should test both the single- and multi-table quality reports use the subsampling version of the metric when applicable.
The text was updated successfully, but these errors were encountered: