Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Possible performance regression on Windows #347

Open
nwalfield opened this issue Jan 26, 2024 · 5 comments
Open

Possible performance regression on Windows #347

nwalfield opened this issue Jan 26, 2024 · 5 comments

Comments

@nwalfield
Copy link

nwalfield commented Jan 26, 2024

As part of our release process, we update all of our dependencies. Prior to starting the release process for sequoia-openpgp 1.18.0, we had anyhow 1.0.75 in our Cargo.lock file. cargo update updated anyhow to the latest version, 1.0.79.

Our CI is configured to build Sequoia in various configurations on Linux, and on Windows. We have two Windows jobs: windows-msvc-cng, which uses MSVC in 64-bit mode, and windows-msvc32-cng, which uses MSVC in 32-bit mode. After applying this change, the time to run the windows-msvc-cng went from ~4 minutes to 23 minutes, and the windows-msvc32-cng went from 3 minutes to not finishing after ~3 hours.

At first we thought it was the Windows VM, but retrying old pipelines worked in the sense that the Windows jobs finished in the expected amount of time. We then had the theory that some package has a performance regression. I spent some time updating a few packages at a time, and seeing what happens with the CI jobs. When the performance problem kicked in, I removed packages until I found a package that induces the performance problem. It turned out that it was anyhow.

I then tried to figure out what version of anyhow first induced the problem. 1.0.76 is fine. Using 1.0.77 however exhibits the performance problem.

Here's what the CI output from the 1.0.77 log looks like:

...
test cert::parser::test::parse_keyring_interleaved_unrecoverable_junk has been running for over 60 seconds
test cert::parser::test::parse_keyring_no_public_key has been running for over 60 seconds
test cert::parser::test::parse_keyring_simple has been running for over 60 seconds
test cert::raw::test::a_cert has been running for over 60 seconds
...

Here are the tests, which don't look suspicious to me:

Some details of the environment:

**********************************************************************
** Visual Studio 2019 Developer Command Prompt v16.11.33
** Copyright (c) 2021 Microsoft Corporation
**********************************************************************
This Docker image is exclusively intended to be run in the continuous integration
pipeline for building and contributing to the open-source Sequoia project
(https://sequoia-pgp.org/) and bundles Visual C++ 2019 Build Tools to do so.
Please be advised that the Build Tools is licensed as a supplement to the
Visual Studio license. To use this software you have to have a valid Visual
Studio license.
$ rustc --version --verbose
rustc 1.70.0 (90c541806 2023-05-31)
binary: rustc
commit-hash: 90c541806f23a127002de5b4038be731ba1458ca
commit-date: 2023-05-31
host: x86_64-pc-windows-msvc
release: 1.70.0
LLVM version: 16.0.2
$ cargo --version
cargo 1.70.0 (ec8a8a0ca 2023-04-25)
@nwalfield
Copy link
Author

I just 1.0.80, and I'm seeing the same performance issue. Reverting back to 1.0.76 fixes the issue.

@nwalfield
Copy link
Author

nwalfield commented Feb 20, 2024

I traced the regression down to d6acd22.

I added the following to our Cargo.toml:

[patch.crates-io]
anyhow = { git = 'https://github.com/dtolnay/anyhow.git', rev = 'b91db55c4eb21611832bba2eef4300d2816e1d04' }

And then changed the revision until I found the earliest commit (according to git log) that exhibits the performance issue in our CI. That was d6acd22200de94b65804e08ca41a4bdff3404512. The preceding commit 5121cd2, appears to be fine.

@decathorpe
Copy link

decathorpe commented Feb 29, 2024

This looks like the backtrace functionality that was stabilized in the Rust std is performing differently / worse than the one in the external backtrace crate, especially on the x86_64-pc-windows-msvc target?


EDIT: Looks like with Rust >= 1.65, backtrace support is now enabled unconditionally, so rather than things being slower with std than with the external backtrace crate, it looks like the cause might actually be that backtrace support is now enabled unconditionally whereas it wasn't enabled before (sequoia-openpgp does not use the "backtrace" feature of anyhow).

@LPGhatguy
Copy link

LPGhatguy commented Mar 19, 2024

Hello, I just stubbed my toe on this too and wasted a few hours combing through our dependency upgrades.

This change ruins the performance of our game on Windows. Capturing a single backtrace on Windows takes enough time to cause noticeable hitching. That's a huge amount of time spent on backtraces that my users aren't even seeing!

Edit: It seems like setting RUST_LIB_BACKTRACE=0 will force backtrace captures off on errors and fix all of the performance lost.

@workingjubilee
Copy link

workingjubilee commented Mar 19, 2024

The ~current nightly version of the standard library should have performance improvements for obtaining backtraces, specifically on Windows. I would be interested in knowing how bad the regression is, comparatively, in applications compiled with the recent nightly stdlib (versus those compiled with the latest stable, which will be slightly behind that).

Note that I do expect it will still be very harsh, but it will at least use a less... clumsy, ancient, deprecated API for obtaining the trace.

popcnt1 pushed a commit to popcnt1/rooch that referenced this issue Nov 10, 2024
new version of anyhow with std backtrace is much slower: dtolnay/anyhow#347

```
cargo bench --bench bench_utils

...

anyhow_error_bench/anyhow_v1.0.93
                        time:   [15.040 ns 15.148 ns 15.259 ns]
                        change: [+1.5448% +2.1215% +2.7219%] (p = 0.00 < 0.05)
                        Performance has regressed.
anyhow_error_bench/anyhow_v1.0.76
                        time:   [6.2104 ns 6.2182 ns 6.2278 ns]
                        change: [-0.4029% -0.1998% +0.0002%] (p = 0.06 > 0.05)
                        No change in performance detected.
Found 10 outliers among 100 measurements (10.00%)
  2 (2.00%) high mild
  8 (8.00%) high severe
```
jolestar pushed a commit to rooch-network/rooch that referenced this issue Nov 10, 2024
* feat(rooch-benchmarks): bench anyhow with/without std backtrace

new version of anyhow with std backtrace is much slower: dtolnay/anyhow#347

```
cargo bench --bench bench_utils

...

anyhow_error_bench/anyhow_v1.0.93
                        time:   [15.040 ns 15.148 ns 15.259 ns]
                        change: [+1.5448% +2.1215% +2.7219%] (p = 0.00 < 0.05)
                        Performance has regressed.
anyhow_error_bench/anyhow_v1.0.76
                        time:   [6.2104 ns 6.2182 ns 6.2278 ns]
                        change: [-0.4029% -0.1998% +0.0002%] (p = 0.06 > 0.05)
                        No change in performance detected.
Found 10 outliers among 100 measurements (10.00%)
  2 (2.00%) high mild
  8 (8.00%) high severe
```

* fix(deps): downgrade anyhow version to 1.0.76

Downgraded anyhow dependency from version 1.0.93 to 1.0.76. Won't break anything but gain better perf.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants