Implement perfect waking for array/vec Join #110

Swatinem · 2022-11-30T11:08:44Z

Tries to implement #21 for array and vec Join.

I’m a bit surprised as I have a ~400% regression in the benchmarks. Though I had a hard time figuring out how the benchmarks are working, and what they are even benchmarking.

Looks like the CountdownFuture used in benchmarks might not be ideal to test for this?
To properly test this, we could have a list of futures that wake from back to front, and count the number of times they were polled. With "perfect" waking, they should only be polled twice (initially, and a second time once they are woken).
By making the poll "expensive", the wins should be visible in the benchmarks as well.

Tries to implement yoshuawuyts#21 for array and vec Join.

Swatinem · 2022-11-30T11:32:29Z

I’m wondering if this can be optimized a bit, as each poll will now do at least one locking cycle. Can we quantify the effect of "bad behaved" futures with a slow poll, vs good futures that have a pending fast-path?
I would assume this is mostly a pessimization for futures with a fast-path.

Maybe we can even use the Join futures themselves as an example, as before this patch, they were recursing to each of their children instead of offering a fast-path via any_ready.

eholk · 2022-12-16T23:09:02Z

Whoa, sorry for letting this PR sit for two weeks! Thanks for the contribution. The code looks good, so I'm going to go ahead and merge it in the interest of parity with the other implementations. I think we should investigate the performance more closely though!

Looks like the CountdownFuture used in benchmarks might not be ideal to test for this?

Perhaps not. I think the goal of CountdownFuture is to try to construct a worst-case scenario for waking behavior, but we probably don't have it exactly right just yet. It'd be better if we had a variety of benchmarks though, and especially some that better approximate real-world behavior.

I’m wondering if this can be optimized a bit, as each poll will now do at least one locking cycle. Can we quantify the effect of "bad behaved" futures with a slow poll, vs good futures that have a pending fast-path?
I would assume this is mostly a pessimization for futures with a fast-path.

I was just looking over the implementation of WakerVec, and I think we should be able to optimize this quite a bit, and hopefully remove locks in most cases. I think we'll always need one to access the parent waker, but for the readiness vec we could trying using an atomic bit vector instead. In uncontended cases I expect the lock will be about the same, but with atomic ops it seems like contention should be less likely than if everything has to grab the same lock.

Also, we are currently creating an Arc for each of the InlineWakerVec entries, which puts a lot of pressure on the allocator and the Arc incurs some atomic operation costs as well.

Implement perfect waking for array/vec Join

d925dca

Tries to implement yoshuawuyts#21 for array and vec Join.

Swatinem force-pushed the perfect-join branch from dba2e92 to d925dca Compare November 30, 2022 11:24

eholk merged commit 8f243ae into yoshuawuyts:main Dec 16, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Implement perfect waking for array/vec Join #110

Implement perfect waking for array/vec Join #110

Swatinem commented Nov 30, 2022

Swatinem commented Nov 30, 2022

eholk commented Dec 16, 2022

Implement perfect waking for array/vec Join #110

Implement perfect waking for array/vec Join #110

Conversation

Swatinem commented Nov 30, 2022

Swatinem commented Nov 30, 2022

eholk commented Dec 16, 2022