-
Notifications
You must be signed in to change notification settings - Fork 264
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improve Swift concurrency safety #1376
Merged
Merged
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
1d2cdea
to
09272b4
Compare
* Add lock around appending to the queue * Add lock around entire logic of flushing the delta queue, including enqueuing and flushing in the executors
There was a report of a crash related to this dictionary and while I have not been able to reproduce by hammering many threads to call methods manipulating the tasks dictionary, access to them should be synchronized.
I was able to create crashes by hammering threads to add tags concurrently. Adding the lock prevented crashes, and behavior seems as expected. Unfortunately, the crashes were very flaky when trying to reproduce in tests, they would crash in much less than 50% of runs. Producing the crashes in the example app was much more consistent. Hence why there is no unit test to reproduce this.
I was able to create crashes by hammering threads to add aliases and get aliases concurrently. Adding the lock prevented crashes, and behavior seems as expected. I also had to lock the getters for onesignalId and externalId Unfortunately, the crashes were very flaky when trying to reproduce in tests, they would crash in much less than 50% of runs. Producing the crashes in the example app was much more consistent. Hence why there is no unit test to reproduce this.
* Reproduced crashes by creating multiple async threads that added and removed tags concurrently. * Added a private dispatch queue to synchronize access to the delta queue and request queue. * Crashes no longer happened after this change. * It is possible for the executor to be flushing while a client response is received and modify the request queue. * Additionally, there are some code paths that enqueue and update request but does not go through the operation repo, such as updating session count at the start of a new session.
b4dc632
to
981aba6
Compare
* In a previous commit, an unfair lock was used to synchronize access to the delta queue and synchronize flushing behavior * A dispatch queue seems more appropriate for the Operation Repo to use considering it already polls and flushes on a global queue. * Without the lock or dispatch queue, I reproduced crashes by creating multiple async threads that either added tags or called to flush the operation repo. * With the dispatch queue, those crashes do no happen and behavior seems as expected.
981aba6
to
c45c081
Compare
jkasten2
approved these changes
Mar 4, 2024
This was referenced Mar 5, 2024
Merged
Merged
Merged
18 tasks
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Description
One Line Summary
Add locking and synchronization to Swift codebase to prevent crashes in
OneSignalOSCore
andOneSignalUser
.Details
Motivation
SDK consumers have reported production crashes, and this PR aims to address the crashes that have been reported.
Scope
synchronized
to Background Tasks dictionary access which is in Obj-C.Implementation Details
This PR can be read commit by commit.
While Objective-C has the
@synchronized
directive, the Swift language lacks built-in synchronization features. Some modern concurrency features have been added to Swift but are not available until iOS 13.Most of the changes started with me creating crashes by setting off multiple threads of execution, then testing again after making the fix. In addition to seeing the crashes stop, I also checked the logs to confirm that the behaviors themselves look correct.
This PR uses 2 methods to add concurrency safety.
Locks
addTag
andremoveTag
with the same value on separate threads concurrently. Unfair locks allow the OS to be more performant by reducing context switching and prioritizing threads of higher priority.OSPropertiesModel
andOSIdentityModel
to manage access to their dictionaries.Private Serial Dispatch Queues
Future Work
There are additional places that may be candidates for better concurrency safety but by their nature or how they are accessed, they are shielded by way of the concurrency changes in this PR that protect them at a another layer. This PR should encompass the vast majority of crashes due to concurrency.
Getting unit tests to work. Reproducing crashes was more flaky than in the example app, even with the exact same code. It could be there are more things going on in the example app that eat resources or more threads. Or, it is also likely that the unit tests complete before all background threads are completed.
Testing
Unit testing
Manual testing
Example App on physical iPhone 13 with iOS 17.2
The process I followed are:
For example, to try to reproduce a crash for an
OSIdentityModel
:Another example, to try to reproduce the Operation Repo crashing:
Affected code checklist
Checklist
Overview
Testing
Final pass
This change is![Reviewable](https://camo.githubusercontent.com/1541c4039185914e83657d3683ec25920c672c6c5c7ab4240ee7bff601adec0b/68747470733a2f2f72657669657761626c652e696f2f7265766965775f627574746f6e2e737667)