Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix flaky curve fitting tests #6916

Merged
merged 6 commits into from
Jan 8, 2025
Merged

Conversation

mhucka
Copy link
Contributor

@mhucka mhucka commented Jan 4, 2025

The test function test_calibrate_z_phases() in the file cirq-core/cirq/experiments/z_phase_calibration_test.py eventually ends up calling fit_exponential_decays() in xeb_fitting.py. That function uses SciPy's curve_fit() to fit an exponential function to data. For some unlucky combination of random seed and problem cases, curve_fit() can fail to find a solution, and raises RuntimeError when that happens.

This PR simply changes the call to curve_fit() to use a higher number for the maximum function evaluations. The following is a test case that fails without the change and succeeds with it:

./check/pytest -v --randomly-seed=3258636985 \
cirq-core/cirq/experiments/z_phase_calibration_test.py::test_calibrate_z_phases_no_options

Fixes #6906.

The test function ‘test_calibrate_z_phases()‘ in the file
‘cirq-core/cirq/experiments/z_phase_calibration_test.py‘ eventually
ends up calling `fit_exponential_decays()` in `xeb_fitting.py`. That
function uses SciPy's `curve_fit()` to fit an exponential function to
data. For some unlucky combination of random seed and problem cases,
`curve_fit()` can fail to find a solution, and raises `RuntimeError`
when that happens.

This change wraps the call to `curve_fit()` with a try-catch for
`RuntimeError`; when it happens, the wrapper calls `curve_fit()` one
more time with an explicit setting of higher max function evaluations.
If that fails too, then the resulting new `RuntimeError` will not be
caught.

This is admittedly not a true solution to the problem. In my testing,
this prevents the original CI error (with a particular random seed),
but it's probably the case that the second run of `curve_fit()` could
get unlucky again. Still, this may be good enough for most cases in
practice.

If we continue to see failures, we should explore a more sophisticated
solution to this.
@CirqBot CirqBot added the size: S 10< lines changed <50 label Jan 4, 2025
Copy link

codecov bot commented Jan 4, 2025

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 97.86%. Comparing base (3e16e15) to head (5d5a896).
Report is 2 commits behind head on main.

Additional details and impacted files
@@           Coverage Diff           @@
##             main    #6916   +/-   ##
=======================================
  Coverage   97.86%   97.86%           
=======================================
  Files        1084     1084           
  Lines       94309    94314    +5     
=======================================
+ Hits        92298    92304    +6     
+ Misses       2011     2010    -1     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@mhucka mhucka marked this pull request as ready for review January 4, 2025 01:09
@mhucka mhucka requested review from mrwojtek, vtomole and a team as code owners January 4, 2025 01:09
@mhucka mhucka requested review from viathor and pavoljuhas January 4, 2025 01:09
Comment on lines 651 to 668
def do_curve_fit():
curve_fit = partial(
optimize.curve_fit,
exponential_decay,
cycle_depths,
fidelities,
p0=(a_0, layer_fid_0),
bounds=((0, 0), (1, 1)),
)
try:
return curve_fit()
except RuntimeError: # pragma: no cover
# Curve_fit didn't find a solution. Try once more w/ higher maxfev.
# Default (in SciPy v.1.14) is 100*(1+len(p0)) = 300 for our p0.
return curve_fit(maxfev=1000)

try:
(a, layer_fid), pcov = do_curve_fit()
Copy link
Collaborator

@pavoljuhas pavoljuhas Jan 7, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The change in 0b6cccd is effectively the same as adding a maxfev=1000 argument. Let us just keep it simple.

Suggested change
def do_curve_fit():
curve_fit = partial(
optimize.curve_fit,
exponential_decay,
cycle_depths,
fidelities,
p0=(a_0, layer_fid_0),
bounds=((0, 0), (1, 1)),
)
try:
return curve_fit()
except RuntimeError: # pragma: no cover
# Curve_fit didn't find a solution. Try once more w/ higher maxfev.
# Default (in SciPy v.1.14) is 100*(1+len(p0)) = 300 for our p0.
return curve_fit(maxfev=1000)
try:
(a, layer_fid), pcov = do_curve_fit()
try:
(a, layer_fid), pcov = optimize.curve_fit(
exponential_decay,
cycle_depths,
fidelities,
p0=(a_0, layer_fid_0),
bounds=((0, 0), (1, 1)),
maxfev=1000,
)

@pavoljuhas pavoljuhas changed the title Fix issue #6906 – flaky curve fitting tests Fix flaky curve fitting tests Jan 7, 2025
Copy link
Collaborator

@pavoljuhas pavoljuhas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of repeating the fit, let us just run it once with a larger maxfev.

@mhucka
Copy link
Contributor Author

mhucka commented Jan 7, 2025

So, the reason for the 2-step approach was the following. I noticed that when it was restarted, the second time around it seemed to take far fewer function evaluations. I admit I don't know if that's because it picks a different starting point, or I got lucky in the few times I inspected it, or what. But, when it failed, it seemed to do better the second time around. Since (a) 1000 was an arbitrary number and might not even be large enough for all cases, and (b) second runs after failures seemed to do better, I reasoned that this was a safer choice. ("Safer" in the sense of being more likely to lead to a finish and not another failure.)

This is admittedly weak reasoning, and not entirely logical. (E.g., maybe it can be just as unlucky the 2nd time around and not finish in 1000 steps.)

If you think this is not really sensible or worth it, it's fine with me to change it to the simpler one-shot version.

@pavoljuhas
Copy link
Collaborator

I noticed that when it was restarted, the second time around it seemed to take far fewer function evaluations.

The 2 curve_fit calls are reproducible with the initial parameters given by the p0 argument.
If the second call is changed to curve_fit() without the maxfev, the test would fail.
The one-call version with a large maxfev should be actually faster, because it does not need to repeat the first 300 or so iterations.

@mhucka
Copy link
Contributor Author

mhucka commented Jan 7, 2025

OK, let's change it. I'll commit the changes later today.

@pavoljuhas
Copy link
Collaborator

pavoljuhas commented Jan 8, 2025

Sounds good, please make sure to update the commit message as needed.

mhucka and others added 3 commits January 7, 2025 19:25
Discussions during code review resulted in the decision to not do a
try-fail-retry approach, and instead simply call `curve_fit()` once
but with an explicit, high value of `maxfev`.

This solution also work on the original case. To check, here is the
command that fails without this change and succeeds with the change:
(on a Debian-based Linux system, Python 3.11, NumPy 1.24, and SciPy
1.15.0):

```bash
./check/pytest -v --randomly-seed=3258636985 \
 cirq-core/cirq/experiments/z_phase_calibration_test.py::test_calibrate_z_phases_no_options
```

The above should be executed from the top of the Cirq source tree.
@pavoljuhas pavoljuhas merged commit 9ada8a8 into quantumlib:main Jan 8, 2025
37 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
size: S 10< lines changed <50
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Fix flaky CI checks on Ubuntu
3 participants