-
Notifications
You must be signed in to change notification settings - Fork 10
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Refactor async_rw_mutex
#1379
base: main
Are you sure you want to change the base?
Refactor async_rw_mutex
#1379
Conversation
Coverage summary from CodacySee diff coverage on Codacy
Coverage variation details
Coverage variation is the difference between the coverage for the head and common ancestor commits of the pull request branch: Diff coverage details
Diff coverage is the percentage of lines that are covered by tests out of the coverable lines that the pull request added or modified: See your quality gate settings Change summary preferencesCodacy stopped sending the deprecated coverage status on June 5th, 2024. Learn more |
24103a1
to
0cb0dba
Compare
I think I'm done with the refactorings for this time at least. I'd appreciate someone having at least a high level look at this. I've added some more text and diagrams on how the implementation works. It could probably still be expanded, but I'm hoping it provides at least a better explanation than what was there before. Note that the implementation has changed sufficiently that the new implementation description does not match what was done before. |
0cb0dba
to
51812d5
Compare
It looks like the performance regression is a result of the linked operation states being accessed in reverse order compared to before. This was only affecting the GPU backend in DLA-Future, where GPU work is scheduled inline. I've pushed a commit which restores the order of calling continuations and I'm rerunning benchmarks. We shouldn't make this a guarantee, but I'm keeping the order unchanged in this PR to not upset DLA-Future performance for now. We can see if it's possible to relax the order in the future without affecting performance. |
Reversing the order of continuations now brings the performance very close to what it was with the old implementation. Some algorithms in DLA-Future still show a tiny performance improvement, but nothing dramatic. |
This is again ready for review. Changes since last time are outlined in the previous comments. |
This allows avoiding synchronization required when passing the value from one shared state to another.
Use the shared state already stored in the operation state in continuations.
…hared state Don't do it in the previous shared state, for simpler reasoning about ownership.
…on in async_rw_mutex for continuations
Explicitly specify expected type to avoid unwanted constructor calls.
The value is set directly in the constructor.
…y in async_rw_mutex" This reverts commit 7851830.
…red state" This reverts commit a584683.
…t more straightforward
…etween void and non-void case
…c_rw_mutex Reset the shared state before updating the head of the queue. Once the head of the queue is updated, there's a small time window where continuations could be run inline, and resetting the shared state in `done` could release the last reference to the shared state. Since we want to ensure that the last reference is always released in a continuation, we move the resetting of the shared state to happen before calling `done`. It's safe to do because if no continuations have been added, the shared state is still kept alive by senders, and if continuations have been added, they'll also hold references to the shared state.
d00c0e1
to
0af4fda
Compare
Coverage summary from CodacySee diff coverage on Codacy
Coverage variation details
Coverage variation is the difference between the coverage for the head and common ancestor commits of the pull request branch: Diff coverage details
Diff coverage is the percentage of lines that are covered by tests out of the coverable lines that the pull request added or modified: See your quality gate settings Change summary preferencesCodacy stopped sending the deprecated coverage status on June 5th, 2024. Learn more |
This is an attempt at slightly simplifying and optimizing the internals of
async_rw_mutex
. This avoids the needs for a lock to keep track of continuations and instead triggers continuations through operation states. The continuations are linked to each other through an intrusive linked list of operation states. These changes also avoid the need to have a weak shared pointer between shared states.Overall I'm hoping that the removal of extra reference counting and locks will slightly improve performance, but fundamentally the structure is still the same, requiring the same amount of dynamic allocations (cf. #1125; this PR does not address that) as before (in fact, one more allocation for the value stored by the mutex). So the impact may be minimal in terms of performance. However, I'm also making these changes to make the dependency triggering a bit more understandable.