-
-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
factor::table: Implement a batched version w/ improved performance #2143
Conversation
Rebased to deal with merge conflict, applied |
Fixed the benchmark not to rely on unstable features; surprisingly, this slightly improved the performance of the benchmark code, |
please create a BENCHMARKING.md file like src/uu/ls/BENCHMARKING.md to reproduce your benchmark :) |
OK? Though I'm not sure what I'd say beyond “run |
This is good enough :) |
|
||
Instead, we could run the µbenchmarks in a simulated CPU with [`cachegrind`], | ||
measure execution “time” in that model (in CI), and use it to detect and report | ||
performance improvements and regressions. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@sylvestre Do you agree it would make sense to use something like iai
in factor's CI, and automatically report significant performance changes on the PR's discussion?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why not? Are you aware of any success story with iai ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not yet, but the tool is very new. I seem to recall (non-Rust) projects having decent success using cachegrind
in CI that way, though.
This will be easier to adapt to working with multiple numbers to process at once.
The factor_chunk implementation is a strawman, but getting it in place allows us to set up the microbenchmarking etc.
This keeps the traversal of `P_INVS_U64` (a large table) to a single pass in-order, rather than `CHUNK_SIZE` passes.
Previous version would perform an amount of work proportional to `CHUNK_SIZE`, so this wasn't a valid way to benchmark at multiple values of that constant. The `TryInto` implementation for `&mut [T]` to `&mut [T; N]` relies on `const` generics, and is available in (stable) Rust v1.51 and later.
Adding array-init v2.0.0 Updating cast v0.2.5 -> v0.2.6 Adding pest v2.1.3 Updating rustc_version v0.2.3 -> v0.3.3 Adding semver v0.11.0 Adding semver-parser v0.10.2 Updating serde v1.0.125 -> v1.0.126 Updating serde_derive v1.0.125 -> v1.0.126 Adding ucd-trie v0.1.3 Adding uu_factor_benches v0.0.0 (#tests/benches/factor)
Rebase to deal with merge conflict in |
Tests are blocked on BurntSushi/quickcheck#282, so let's move ahead for now. |
CI breakage is unrelated:
|
factor::table::factor_chunk
table
benchmark to work on stable RustAdd testsFuture work
factor_chunk
Iterator<u64>
intoIterator<Factors>
and automatically using the batched/vectorized implementation internally.
factor::table::factor_chunk
in the CLI application (presumably via the high-level API)