-
Notifications
You must be signed in to change notification settings - Fork 36
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Waker optimization + O(woken) polling for every combinator except chain #115
Conversation
wow, thank you for this! I'll try to make time to review your changes soon -- but I loved the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you again for this wonderful work!
I think this overall improves the codebase and seems to be correct.
But, as this brings a lot of breaking changes, I prefer to have a second word from @yoshuawuyts
We'll need a new major after merging, and we probably should try to work on some details before the release, like improving error handling (happy_eyeballs
exemplifies that returning Vec<Error>
is not very Try
-friendly), trying to iterate in a more fair fashion (in the old Indexer
sense, as I understand, this PR removes it), and seeing if we would benefit of bitvec
in some points. We can iterate on this in small steps after the merge.
Thanks for the review!. It is true that removing AggregateError makes error handling more verbose. Might not have been a good idea on my part. For fairness, things can be complicated Consider these implementations
Situation A async fn work() {
for _ in 0..N {
shared_mutex.lock().await;
}
}
(work(), work()).race().await; Here design 4 is the only fair one. With 0/1/2/3, the left subfuture wins. Situation B async fn work() {
for _ in 0..N {
shared_mutex.lock().await;
yield_now().await;
}
}
(work(), work()).race().await; Here design 4 is fair as before. 0 and 2 prefer the left subfuture. For 1 and 3, the winner depends on the constant N being odd or even. Situation C async fn work(time: f32) {
for _ in 0..N {
sleep(time).await;
}
}
(work(1.00), work(1.01)).race().await; Here, the left subfuture wins in all implementation. Situation D Here 0, 2, and 4 prefer the left subfuture. 3 prefers the right subfuture (!). 1 depends on oddness/evenness of N. So the RNG solution (design 4) is the clear winner. But it comes with the RNG's complexity and nondeterminism. Apart from 4,
But this PR is already too big as it is! So as you said, let's iterate on fairness later. |
@wishawa thanks for this PR! - I think there are some interesting things here that I would like to tease apart, possibly into different PRs. I'm trying to summarize the work you've done, so far I'm seeing the following changes:
This PR by itself is probably too big to review in-depth on its own. However, some of these might be interesting to start off with individually. For example: the edition bump, and changes to the bench suite seem like they would be great stand-alone PRs. I'd also love to see the internal data structure improvements as their own PRs. There are also things I'm less sure about: in particular the changes to the public API may not necessarily be what we want. And I think I'd want to dig in deeper to the proxy trait solution to better understand how that works. But we should be able to do those things incrementally. @wishawa Would you be on board with splitting this PR into smaller PRs we can review those in a more granular manner? |
@yoshuawuyts That's a fair assessment. I'll split the changes into smaller PRs. Although the PR containing the main change (that allows O(woken subfutures) instead of O(total subfutures) polling) will still be pretty big because it involves changing the Readiness/Awakeness API that all the combinators uses. I'll be doing the rebase manually anyway so no need to revert #112 and #113. |
Performance improvements
WakerArray/WakerVec now keeps track of what futures were woken in a list, so that no O(n) search is needed when polling. This makes polling O(woken) instead of O(n). The impact is extremely significant for large combinators.
Also, the new implementation avoids locking and unlocking the Readiness Mutex in a loop. It copies out the data necessary for iteration once at the beginning of each poll.
I've also made WakerArray/WakerVec use a single Arc shared between all wakers (without giving up perfect waking) instead of needing a separate Arc for each. This change involves some unsafe work with RawWaker.
API changes
Race, TryJoin, and RaceOk for tuples now supports futures with heterogeneous results. Racing futures with different output types would return an enum whose variants are the possible outputs. If all outputs are the same type then there is a function to convert the enum to that type.
RaceOk error types are simplified to be just array/tuple/vec of the errors. I removed the wrapping AggregateError because it can be confusing for tuples (since error types are not necessarily homogeneous anymore).
Organizational changes
As part of rewriting the various combinators, I've merged the code for join/try_join/race/race_ok. There is now a crate-private futures::common module with a generic combinator whose behaviors can be controlled to match join/try_join/race/race_ok by a generic type parameter. For tuples, I basically implement try join and make every other combinator delegate to that.
I've also upped the edition to Rust 2021. This is not really necessary but the disjoint closure capture saves a few lines of code.
I renamed "Readiness" to "Awakeness" because the former gets confusing since Poll/PollState::Ready means the future is complete rather than awake.
Benchmark
Currently, CountdownFuture/Stream wake and complete in perfect order. This is unrealistic. I added shuffling (with a fixed seed) so that they better represent real workload.
Below:
after = this PR
before = origin/main with countdown shuffling commit cherry-picked