Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement perfect waking for array/vec Join #110

Merged
merged 1 commit into from
Dec 16, 2022

Conversation

Swatinem
Copy link
Contributor

Tries to implement #21 for array and vec Join.

I’m a bit surprised as I have a ~400% regression in the benchmarks. Though I had a hard time figuring out how the benchmarks are working, and what they are even benchmarking.

Looks like the CountdownFuture used in benchmarks might not be ideal to test for this?
To properly test this, we could have a list of futures that wake from back to front, and count the number of times they were polled. With "perfect" waking, they should only be polled twice (initially, and a second time once they are woken).
By making the poll "expensive", the wins should be visible in the benchmarks as well.

Tries to implement yoshuawuyts#21 for array and vec Join.
@Swatinem
Copy link
Contributor Author

I’m wondering if this can be optimized a bit, as each poll will now do at least one locking cycle. Can we quantify the effect of "bad behaved" futures with a slow poll, vs good futures that have a pending fast-path?
I would assume this is mostly a pessimization for futures with a fast-path.

Maybe we can even use the Join futures themselves as an example, as before this patch, they were recursing to each of their children instead of offering a fast-path via any_ready.

@eholk
Copy link
Collaborator

eholk commented Dec 16, 2022

Whoa, sorry for letting this PR sit for two weeks! Thanks for the contribution. The code looks good, so I'm going to go ahead and merge it in the interest of parity with the other implementations. I think we should investigate the performance more closely though!

Looks like the CountdownFuture used in benchmarks might not be ideal to test for this?

Perhaps not. I think the goal of CountdownFuture is to try to construct a worst-case scenario for waking behavior, but we probably don't have it exactly right just yet. It'd be better if we had a variety of benchmarks though, and especially some that better approximate real-world behavior.

I’m wondering if this can be optimized a bit, as each poll will now do at least one locking cycle. Can we quantify the effect of "bad behaved" futures with a slow poll, vs good futures that have a pending fast-path?
I would assume this is mostly a pessimization for futures with a fast-path.

I was just looking over the implementation of WakerVec, and I think we should be able to optimize this quite a bit, and hopefully remove locks in most cases. I think we'll always need one to access the parent waker, but for the readiness vec we could trying using an atomic bit vector instead. In uncontended cases I expect the lock will be about the same, but with atomic ops it seems like contention should be less likely than if everything has to grab the same lock.

Also, we are currently creating an Arc for each of the InlineWakerVec entries, which puts a lot of pressure on the allocator and the Arc incurs some atomic operation costs as well.

@eholk eholk merged commit 8f243ae into yoshuawuyts:main Dec 16, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants