-
Notifications
You must be signed in to change notification settings - Fork 1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix flaky curve fitting tests #6916
Conversation
The test function ‘test_calibrate_z_phases()‘ in the file ‘cirq-core/cirq/experiments/z_phase_calibration_test.py‘ eventually ends up calling `fit_exponential_decays()` in `xeb_fitting.py`. That function uses SciPy's `curve_fit()` to fit an exponential function to data. For some unlucky combination of random seed and problem cases, `curve_fit()` can fail to find a solution, and raises `RuntimeError` when that happens. This change wraps the call to `curve_fit()` with a try-catch for `RuntimeError`; when it happens, the wrapper calls `curve_fit()` one more time with an explicit setting of higher max function evaluations. If that fails too, then the resulting new `RuntimeError` will not be caught. This is admittedly not a true solution to the problem. In my testing, this prevents the original CI error (with a particular random seed), but it's probably the case that the second run of `curve_fit()` could get unlucky again. Still, this may be good enough for most cases in practice. If we continue to see failures, we should explore a more sophisticated solution to this.
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## main #6916 +/- ##
=======================================
Coverage 97.86% 97.86%
=======================================
Files 1084 1084
Lines 94309 94314 +5
=======================================
+ Hits 92298 92304 +6
+ Misses 2011 2010 -1 ☔ View full report in Codecov by Sentry. |
def do_curve_fit(): | ||
curve_fit = partial( | ||
optimize.curve_fit, | ||
exponential_decay, | ||
cycle_depths, | ||
fidelities, | ||
p0=(a_0, layer_fid_0), | ||
bounds=((0, 0), (1, 1)), | ||
) | ||
try: | ||
return curve_fit() | ||
except RuntimeError: # pragma: no cover | ||
# Curve_fit didn't find a solution. Try once more w/ higher maxfev. | ||
# Default (in SciPy v.1.14) is 100*(1+len(p0)) = 300 for our p0. | ||
return curve_fit(maxfev=1000) | ||
|
||
try: | ||
(a, layer_fid), pcov = do_curve_fit() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The change in 0b6cccd is effectively the same as adding a maxfev=1000
argument. Let us just keep it simple.
def do_curve_fit(): | |
curve_fit = partial( | |
optimize.curve_fit, | |
exponential_decay, | |
cycle_depths, | |
fidelities, | |
p0=(a_0, layer_fid_0), | |
bounds=((0, 0), (1, 1)), | |
) | |
try: | |
return curve_fit() | |
except RuntimeError: # pragma: no cover | |
# Curve_fit didn't find a solution. Try once more w/ higher maxfev. | |
# Default (in SciPy v.1.14) is 100*(1+len(p0)) = 300 for our p0. | |
return curve_fit(maxfev=1000) | |
try: | |
(a, layer_fid), pcov = do_curve_fit() | |
try: | |
(a, layer_fid), pcov = optimize.curve_fit( | |
exponential_decay, | |
cycle_depths, | |
fidelities, | |
p0=(a_0, layer_fid_0), | |
bounds=((0, 0), (1, 1)), | |
maxfev=1000, | |
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Instead of repeating the fit, let us just run it once with a larger maxfev.
So, the reason for the 2-step approach was the following. I noticed that when it was restarted, the second time around it seemed to take far fewer function evaluations. I admit I don't know if that's because it picks a different starting point, or I got lucky in the few times I inspected it, or what. But, when it failed, it seemed to do better the second time around. Since (a) 1000 was an arbitrary number and might not even be large enough for all cases, and (b) second runs after failures seemed to do better, I reasoned that this was a safer choice. ("Safer" in the sense of being more likely to lead to a finish and not another failure.) This is admittedly weak reasoning, and not entirely logical. (E.g., maybe it can be just as unlucky the 2nd time around and not finish in 1000 steps.) If you think this is not really sensible or worth it, it's fine with me to change it to the simpler one-shot version. |
The 2 curve_fit calls are reproducible with the initial parameters given by the p0 argument. |
OK, let's change it. I'll commit the changes later today. |
Sounds good, please make sure to update the commit message as needed. |
Discussions during code review resulted in the decision to not do a try-fail-retry approach, and instead simply call `curve_fit()` once but with an explicit, high value of `maxfev`. This solution also work on the original case. To check, here is the command that fails without this change and succeeds with the change: (on a Debian-based Linux system, Python 3.11, NumPy 1.24, and SciPy 1.15.0): ```bash ./check/pytest -v --randomly-seed=3258636985 \ cirq-core/cirq/experiments/z_phase_calibration_test.py::test_calibrate_z_phases_no_options ``` The above should be executed from the top of the Cirq source tree.
The test function
test_calibrate_z_phases()
in the filecirq-core/cirq/experiments/z_phase_calibration_test.py
eventually ends up callingfit_exponential_decays()
inxeb_fitting.py
. That function uses SciPy'scurve_fit()
to fit an exponential function to data. For some unlucky combination of random seed and problem cases,curve_fit()
can fail to find a solution, and raisesRuntimeError
when that happens.This PR simply changes the call to
curve_fit()
to use a higher number for the maximum function evaluations. The following is a test case that fails without the change and succeeds with it:Fixes #6906.