Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SharedPreferences are sometimes not saved properly #36

Closed
shankari opened this issue Jul 6, 2015 · 17 comments
Closed

SharedPreferences are sometimes not saved properly #36

shankari opened this issue Jul 6, 2015 · 17 comments

Comments

@shankari
Copy link
Contributor

shankari commented Jul 6, 2015

It looks like SharedPreferences do not save properly in some cases.
Here's what I saw from the app logs for a trip in which I left Tied House at 8:30pm and came home.
The app appeared to have been killed and relaunched by the geofence exit, since the id was 9.

[9|1435807867723|7/1/15 8:31 PM|FINE]TripDiaryStateMachineReceiver : after reading from the prefs, the current state is local.state.waiting_for_trip_start

After processing the geofence exit, the new state was ongoing_trip.

[14|1435807867853|7/1/15 8:31 PM|FINE]TripDiaryStateMachineReceiver : newState after handling action is local.state.ongoing_trip

However, when the state is next read, it is still waiting_for_trip_start

[87|1435808413146|7/1/15 8:40 PM|FINE]TripDiaryStateMachineReceiver : after reading from the prefs, the current state is local.state.waiting_for_trip_start

So when the trip end is detected, we think that the current state is already waiting_for_trip_start,
so we do not turn off ongoing tracking, and we do not establish the geofence.

@shankari
Copy link
Contributor Author

shankari commented Jul 6, 2015

Additional logging added in pull request #35

@shankari
Copy link
Contributor Author

shankari commented Jul 6, 2015

A potential solution might be to use commit() instead of apply().
commit() saves synchronously and returns a result value
apply() saves asynchronously and fails silently. I am not sure what might cause the save to fail.
It is not supposed to be the android lifecycle because:

You don't need to worry about Android component lifecycles and their interaction with apply() writing to disk. The framework makes sure in-flight disk writes from apply() complete before switching states.

On the other hand, this is a service, not an activity. Maybe that's why it is confusing?

@sfwatergit
Copy link

If this refers specifically to the call on SharedPrefs in TripDiaryStateMachineReceiver, then it is due to the lifecycle of BroadcastReceiver, which is not a service, but ceases to exist as a separate component almost immediately after completion of any running code. You may wish to extend WakefulBroadcastReceiver, use call goAsync within the onReceive(Context,Intent) method, or simply refactor the BroadcastReceiver as an inner static class of a service, which is a bit more idiomatic in my experience. I tend to prefer use of the CommonsWare WakefulIntentService for state machines and other short small class that act as short-lived data synchronizers: https://github.com/commonsguy/cwac-wakeful.

@shankari
Copy link
Contributor Author

If this refers specifically to the call on SharedPrefs in TripDiaryStateMachineReceiver, then it is due to the lifecycle of BroadcastReceiver, which is not a service, but ceases to exist as a separate component almost immediately after completion of any running code.

Yes, it refers to TripDiaryStateMachineReceiver. However, as I quoted above, the documentation for apply() specifically says that

You don't need to worry about Android component lifecycles and their interaction with apply() writing to disk. The framework makes sure in-flight disk writes from apply() complete before switching states.

Presumably, this includes switching from the existing to the not-existing state.

Ah, I think that the confusion was because I said "this is a service". You are absolutely right that it is not. I meant to say "this is not an activity" and was sloppy. But the question about why the save failed is still, IMHO valid.

@sfwatergit
Copy link

There is some lack of detail in the Android documentation about killing a process with a thread running asynchronously.

Check out: http://developer.android.com/reference/android/content/BroadcastReceiver.html#onReceive(android.content.Context, android.content.Intent) http://developer.android.com/reference/android/content/BroadcastReceiver.html#onReceive(android.content.Context,%20android.content.Intent) and compare with:

Which says that you should not run asynchronous processes from the broadcast receiver’s onReceive(). Whereas:

http://developer.android.com/reference/android/content/SharedPreferences.Editor.html#apply() http://developer.android.com/reference/android/content/SharedPreferences.Editor.html#apply()

despite SharedPreferences.Editor.apply() being an async call, the docs indicate that you need not worry about state change, as you’ve mentioned.

So, my understanding is that even if the process is killed, even if your processing is not done within 10 secs any calls to async.apply() should eventually complete.

However, reading down a bit, I notice your last call in onReceive() is to connect to a service, which itself may be long-running… those Callbacks down there are worrisome… (I’ve experienced IntentServices killed off when connecting to a service after onStartCommand (though this can be ameliorated on a case-specific basis somewhat with START_STICKY).

OK, so aside from the Callbacks in the service (which may or may not return if the object reference is destroyed), the instant you call .connect(), the BR's onReceive() must return, which makes it eligible to be killed off immediately. The background processing may still run, though, if you manage to connect before Android kills the BackgroundReceiver. This may be dangerous; however, if several intents are received sequentially, since the context of the BroadcastReceiver may be leaked. Well, this is perhaps intentional, since I see you are maintaining the context as an instance variable.

So, another possibility is that the action bearing end_trip is waiting to be written; however, another .apply() is called almost immediately afterwards on the waiting_for_trip_start Action. Despite what is written there, my hunch is that the .apply()s are not completed synchronously, since they are, as indicated, asynchronous writes (despite living in separate processes and saving immediately to memory, the actual IO on the SharedPreference (which is an XML file) must be synchronous. The ordering and safety guarantee, I believe, on applies to, um, .apply()s, which are initiated in the same process (i.e., the BroadcastReceiver), but not necessarily to .apply()s called from different processes, which is what would happen if the BroadcastReceiver responds to Intent B before Intent A’s .apply() calls have a chance to complete.

Do you think these reasons apply in your situation?

Sid Feygin

Ph. D Student
UC Berkeley: Civil and Environmental Engineering Department, Civil Systems
[email protected]

smartcities.berkeley.edu
xmobile.berkeley.edu

On Aug 22, 2015, at 11:25 PM, shankari [email protected] wrote:

If this refers specifically to the call on SharedPrefs in TripDiaryStateMachineReceiver, then it is due to the lifecycle of BroadcastReceiver, which is not a service, but ceases to exist as a separate component almost immediately after completion of any running code.

Yes, it refers to TripDiaryStateMachineReceiver. However, as I quoted above, the documentation for apply() specifically says that

You don't need to worry about Android component lifecycles and their interaction with apply() writing to disk. The framework makes sure in-flight disk writes from apply() complete before switching states.

Presumably, this includes switching from the existing to the not-existing state.

Ah, I think that the confusion was because I said "this is a service". You are absolutely right that it is not. I meant to say "this is not an activity" and was sloppy. But the question about why the save failed is still, IMHO valid.


Reply to this email directly or view it on GitHub #36 (comment).

@shankari
Copy link
Contributor Author

However, reading down a bit, I notice your last call in onReceive() is to connect to a service, which itself may be long-running… those Callbacks down there are worrisome… (I’ve experienced IntentServices killed off when connecting to a service after onStartCommand (though this can be ameliorated on a case-specific basis somewhat with START_STICKY).

Which service are you talking about? The callbacks for creating the geofence, etc? IIRC, those are executed in the thread of the geofence creation, not the thread of the broadcast receiver. So basically, we start a new thread to perform a new operation and provide code to be run in that thread once the operation is complete.

Do you think these reasons apply in your situation?
I am not sure - it seems like there is a fair amount of guessing/reading between the lines on what the apply() call does, which feels like reverse engineering the android library. I would prefer not to reverse engineer undocumented behavior which can change in future versions unless I absolutely have to. I would prefer to robustify the code and move on.

My plan for this issue was as follows:

  1. Improve logging
  2. See if it recurs
  3. If it does, change the apply() to commit(), which is a blocking, synchronous call.

#2 above hasn't happened yet, so I haven't made the change. If it doesn't recur until I am ready to deploy to the play store, I will probably make the change anyway just to be on the safe side. The performance implications don't appear to be very significant.

@shankari
Copy link
Contributor Author

Looks like the file handler option to log to file is in fact not thread safe.

Logging with both the filehandler and a database handler at the same time indicates that chunks of log are just dropped on the floor, although the presence of a .lck file would seem to indicate that the FileHandler does file level locking. Apparently it is not good enough across contexts.

See #42 for more details. So it might just be that the change in the preferences was not logged correctly and this might be a non-issue after all.

@sfwatergit
Copy link

The mention of services above was to reiterate that despite the fact that you would run the geofence creation in a separate thread, the BroadcastReceiver component and process would certainly will be destroyed. This may or may not a problem for the thread, since it maintains the object reference (hence the leaked context) and won't be garbage collected, but the thread interaction with repeated calls to the BroadcastIntent (and any services invoked therein) may result in race conditions.

See also: https://groups.google.com/forum/#!topic/android-developers/iqwmDdUumXg . This describes a very similar issue.

Even improved logging will make it difficult to catch a race condition. However, SharedPreferences.OnSharedPreferenceChangeListener may be useful here if you would like to log exactly when the shared prefs are changed. I can imagine you could test this by running a long-running service from your main activity (bound or unbound) and listen for changes from the broadcast receiver.

Also, are you able to recover the logcat?

@shankari
Copy link
Contributor Author

Also, are you able to recover the logcat?

No. The events happened on 1st Jul. I noticed this on 5th Jul and filed the bug. This incident had already moved out of logcat by then.

Even improved logging will make it difficult to catch a race condition.

Agreed. But once we know that the file based logging is not reliable, there is no indication that there was a race condition, or even that the shared preferences were not saved correctly.

I originally thought that the shared preferences were not saved correctly because I saw a newState = ongoing_trip followed by read from the shared preferences of waiting_for_trip_state. But if we could have missing logs, then it could be that we detected a trip end in the interim and that legitimately caused the state the change, except the trip end detection was overriden.

I am going to close this and the other "mysterious behavior with location stuff" bugs for now since they were not based on reliable logs. I will reopen them if I see the same behavior in the database-based logs.

@shankari
Copy link
Contributor Author

I am seeing similar behavior even with the reliable logs, so it looks like the callbacks don't work properly from a BroadcastReceiver, as @sfwatergit suggested.

Below, we show three instances where this happened. First, let us show what a successful invocation looks like.

[1|1441757464330|9/8/15 5:11 PM|FINE]GeofenceExitIntentService onCreate called
[3|1441757464396|9/8/15 5:11 PM|FINE]GeofenceExitIntentService onStartCommand called with intent Intent { cmp=edu.berkeley.eecs.cfc_tracker/.location.GeofenceExitIntentService (has extras) } flags 0 sta
rtId 1
[5|1441757464437|9/8/15 5:11 PM|FINE]GeofenceExitIntentService onStart called with startId 1
[7|1441757464476|9/8/15 5:11 PM|FINE]GeofenceExitIntentService geofence exit intent action = null
[9|1441757464573|9/8/15 5:11 PM|FINE]GeofenceExitIntentService got geofence intent callback with type 2 and location Location[fused 37.874832,-122.259241 acc=845 et=+12d1h19m15s758ms]
[11|1441757464633|9/8/15 5:11 PM|FINE]GeofenceExitIntentService Geofence exited! Intent = Intent { cmp=edu.berkeley.eecs.cfc_tracker/.location.GeofenceExitIntentService (has extras) } Starting ongoing m
onitoring...
[13|1441757464869|9/8/15 5:11 PM|FINE]TripDiaryStateMachineReceiver after reading from the prefs, the current state is local.state.waiting_for_trip_start
[15|1441757464985|9/8/15 5:11 PM|FINE]TripDiaryStateMachineReceiver TripDiaryStateMachineReceiver onConnected(null) called
[17|1441757465044|9/8/15 5:11 PM|FINE]TripDiaryStateMachineReceiver TripDiaryStateMachineReceiver handleAction(local.state.waiting_for_trip_start, local.transition.exited_geofence) called
[19|1441757465078|9/8/15 5:11 PM|FINE]TripDiaryStateMachineReceiver TripDiaryStateMachineReceiver handleTripStart(local.transition.exited_geofence) called
[21|1441757465131|9/8/15 5:11 PM|FINE]CreateGeofenceAction Removing geofence with ID = DYNAMIC_EXIT_GEOFENCE
[23|1441757465211|9/8/15 5:11 PM|FINE]LocationHandler default request is Request[PRIORITY_BALANCED_POWER_ACCURACY requested=3600000ms fastest=600000ms]
[25|1441757465288|9/8/15 5:11 PM|FINE]ActivityRecognitionHandler Starting activity recognition with interval = 30000
[27|1441757465342|9/8/15 5:11 PM|FINE]LocationChangeIntentService onCreate called
[29|1441757465381|9/8/15 5:11 PM|FINE]LocationChangeIntentService onStart called with Intent { cmp=edu.berkeley.eecs.cfc_tracker/.location.LocationChangeIntentService (has extras) } startId 1

Here are the failures.

[1|1441979577703|9/11/15 6:52 AM|FINE]GeofenceExitIntentService onCreate called
[3|1441979577773|9/11/15 6:52 AM|FINE]GeofenceExitIntentService onStartCommand called with intent Intent { cmp=edu.berkeley.eecs.cfc_tracker/.location.GeofenceExitIntentService (has extras) } flags 0 st
artId 1
[5|1441979577818|9/11/15 6:52 AM|FINE]GeofenceExitIntentService onStart called with startId 1
[7|1441979577860|9/11/15 6:52 AM|FINE]GeofenceExitIntentService geofence exit intent action = null
[9|1441979577942|9/11/15 6:52 AM|FINE]GeofenceExitIntentService got geofence intent callback with type 2 and location Location[fused 37.391921,-122.085298 acc=10 et=+14d15h1m9s567ms alt=-5.70001220703125 vel=6.6303 bear=29.0]
[11|1441979577986|9/11/15 6:52 AM|FINE]GeofenceExitIntentService Geofence exited! Intent = Intent { c
mp=edu.berkeley.eecs.cfc_tracker/.location.GeofenceExitIntentService (has extras) } Starting ongoing 
monitoring...
[13|1441979578216|9/11/15 6:52 AM|FINE]TripDiaryStateMachineReceiver after reading from the prefs, the current state is local.state.waiting_for_trip_start

@shankari
Copy link
Contributor Author

Another one.

[1|1441680097199|9/7/15 7:41 PM|FINE]GeofenceExitIntentService onCreate called
[3|1441680097295|9/7/15 7:41 PM|FINE]GeofenceExitIntentService onStartCommand called with intent Intent { cmp=edu.berkeley.eecs.cfc_tracker/.location.GeofenceExitIntentService (has extras) } flags 0 sta
rtId 1
[5|1441680097353|9/7/15 7:41 PM|FINE]GeofenceExitIntentService onStart called with startId 1
[7|1441680097398|9/7/15 7:41 PM|FINE]GeofenceExitIntentService geofence exit intent action = null
[9|1441680097531|9/7/15 7:41 PM|FINE]GeofenceExitIntentService got geofence intent callback with type 2 and location Location[fused 37.390951,-122.086278 acc=30 et=+11d3h49m46s980ms]
[11|1441680097576|9/7/15 7:41 PM|FINE]GeofenceExitIntentService Geofence exited! Intent = Intent { cmp=edu.berkeley.eecs.cfc_tracker/.location.GeofenceExitIntentService (has extras) } Starting ongoing m
onitoring...
[13|1441680097811|9/7/15 7:41 PM|FINE]TripDiaryStateMachineReceiver after reading from the prefs, the current state is local.state.waiting_for_trip_start
[15|1441680097943|9/7/15 7:41 PM|FINE]TripDiaryStateMachineReceiver TripDiaryStateMachineReceiver onConnected(null) called
[17|1441680097991|9/7/15 7:41 PM|FINE]TripDiaryStateMachineReceiver TripDiaryStateMachineReceiver handleAction(local.state.waiting_for_trip_start, local.transition.exited_geofence) called

@shankari
Copy link
Contributor Author

In both those cases, the thread is terminated at that point. I can tell because the log ID gets reset right after that.

[17|1441680097991|9/7/15 7:41 PM|FINE]TripDiaryStateMachineReceiver TripDiaryStateMachineReceiver handleAction(local.state.waiting_for_trip_start, local.transition.exited_geofence) called
[1|1441682030836|9/7/15 8:13 PM|FINE]BuiltinUserCache While searching for regex, got 1 results
[13|1441979578216|9/11/15 6:52 AM|FINE]TripDiaryStateMachineReceiver after reading from the prefs, the current state is local.state.waiting_for_trip_start
[1|1441980831591|9/11/15 7:13 AM|FINE]BuiltinUserCache While searching for regex, got 0 results

It is really clear in the first of these, since the implementation of handleAction is basically a simple if/then statement that should have called handleTripStart. There are no async calls that might fail between those two statements, which makes it pretty clear that the process was terminated at that point.

I am converting this to a service. Thanks to @sfwatergit for the catch!

@shankari
Copy link
Contributor Author

I see this behavior even after converting to a service. However, I think that in this case, there might be a race caused by the delay in write.

Here are the related logs:

We detect the end of the trip

[41|1442946453283|9/22/15 11:27 AM|FINE]TripDiaryStateMachineService TripDiaryStateMachineReceiver handleAction(local.state.ongoing_trip, local.transition.stopped_moving) called
[43|1442946453406|9/22/15 11:27 AM|FINE]TripDiaryStateMachineService TripDiaryStateMachineReceiver handleTripEnd(local.transition.stopped_moving) called
[45|1442946453511|9/22/15 11:27 AM|FINE]ActivityRecognitionHandler Stopping activity recognition with interval = 30000
[47|1442946453596|9/22/15 11:27 AM|FINE]CreateGeofenceAction mLastLocation has elapsed time = 324019199000000
[49|1442946453631|9/22/15 11:27 AM|FINE]CreateGeofenceAction Last location is Location[fused 37.870491,-122.260089 acc=30 et=+3d18h0m19s199ms] creating geofence
[51|1442946453748|9/22/15 11:27 AM|FINE]CreateGeofenceAction creating geofence at location 37.8704911, -122.2600891
[53|1442946453960|9/22/15 11:27 AM|FINE]TripDiaryStateMachineService newState after handling action is local.state.waiting_for_trip_start

Almost immediately, the geofence exits so we read the value from a different context.

The value read is still the old one (ongoing_trip) so we ignore it.

[1|1442946547133|9/22/15 11:29 AM|FINE]GeofenceExitIntentService onCreate called
[3|1442946547349|9/22/15 11:29 AM|FINE]GeofenceExitIntentService onStartCommand called with intent Intent { cmp=edu.berkeley.eecs.cfc_tracker/.location.GeofenceExitIntentService (has extras) } flags 0 startId 1
[5|1442946548410|9/22/15 11:29 AM|FINE]GeofenceExitIntentService onStart called with startId 1
[7|1442946548482|9/22/15 11:29 AM|FINE]GeofenceExitIntentService geofence exit intent action = null
[9|1442946548879|9/22/15 11:29 AM|FINE]GeofenceExitIntentService got geofence intent callback with type 2 and location Location[fused 37.870813,-122.258860 acc=21 et=+3d18h1m55s322ms bear=9.0]
[11|1442946548953|9/22/15 11:29 AM|FINE]GeofenceExitIntentService Geofence exited! Intent = Intent {cmp=edu.berkeley.eecs.cfc_tracker/.location.GeofenceExitIntentService (has extras) } Starting ongoing
 monitoring...
[1|1442946550661|9/22/15 11:29 AM|FINE]TripDiaryStateMachineReceiver TripDiaryStateMachineReciever onReceive(android.app.ReceiverRestrictedContext@421d54b0, Intent { act=local.transition.exited_geofence flg=0x10 cmp=edu.berkeley.eecs.cfc_tracker/.location.TripDiaryStateMachineReceiver }) called
[3|1442946550829|9/22/15 11:29 AM|FINE]TripDiaryStateMachineService after reading from the prefs, the current state is local.state.ongoing_trip
[5|1442946551237|9/22/15 11:29 AM|FINE]TripDiaryStateMachineService TripDiaryStateMachineReceiver onConnected(null) called
[7|1442946551300|9/22/15 11:29 AM|FINE]TripDiaryStateMachineService TripDiaryStateMachineReceiver handleAction(local.state.ongoing_trip, local.transition.exited_geofence) called
[9|1442946551371|9/22/15 11:29 AM|FINE]TripDiaryStateMachineService TripDiaryStateMachineReceiver handleTripEnd(local.transition.exited_geofence) called

I can think of two possible reasons:

  1. There is a race - the apply() hasn't yet saved the state to preferences before the geofence reads it. The argument in favor of this case is that the gap between the end and the next start is small. But on the other hand, the gap is a couple of minutes, which is an eternity in computer time. But we also don't know how aggressive the android system is.
  2. Note though that the code expects to print a "newState saved in prefManager is " line after the apply() but it doesn't. That argues that maybe the service was also terminated aggressively, before the apply() was invoked. But we have moved to a service.

Next steps - figure out the lifecycle of an intent service by reading docs. I have already done this a bit, and it is not clear when it is terminated but I could try some more.

Next, I will try two options, in this order:

  1. switch to commit()
  2. switch to a generic service and call stopSelf() only after the new state is set

@shankari
Copy link
Contributor Author

Actually, this is probably a result of (2) above.
This stackoverflow answer (http://stackoverflow.com/questions/15524280/service-vs-intentservice)
linked to the source code of IntentService
https://android.googlesource.com/platform/frameworks/base/+/refs/heads/master/core/java/android/app/IntentService.java
which is pretty clear.

Once the onHandleIntent() is completed, stopSelf() is called.
This means that any async tasks launched from onHandleIntent() are not guaranteed to finish.

Will fix this by using a standard Service instead of an IntentService and calling stopSelf from the setNewState method to ensure that we wait until the state is set before the service is terminated.

@sfwatergit
Copy link

There may be an issue with maintenance of the wake lock when activating the
phone following sleep. You might visited the wakeful intent service
mentioned earlier. I have never find I/o issues using this dependency. If
you prefer not to use an external library, consider the nearly equivalent
wakeful broadcast receiver.
On Sep 22, 2015 9:34 PM, "shankari" [email protected] wrote:

Actually, this is probably a result of (2) above.
This stackoverflow answer (
http://stackoverflow.com/questions/15524280/service-vs-intentservice)
linked to the source code of IntentService

https://android.googlesource.com/platform/frameworks/base/+/refs/heads/master/core/java/android/app/IntentService.java
which is pretty clear.

Once the onHandleIntent() is completed, stopSelf() is called.
This means that any async tasks launched from onHandleIntent() are not
guaranteed to finish.

Will fix this by using a standard service instead of an IntentService and
calling stopSelf from the setNewState method to ensure that we wait until
the state is set before the service is terminated.


Reply to this email directly or view it on GitHub
#36 (comment)
.

@shankari
Copy link
Contributor Author

shankari commented Oct 2, 2015

@sfwatergit I really don't think that a wakelock is the issue. While I am using a BroadcastReceiver, it is not triggered from an alarm, which is admittedly the most common BroadcastReceiver. Instead, I am generating the broadcasts in my code, which is invoked from IntentServices that handle various location based intents that I registered for.

From the documentation for your suggested library (emphasis mine)

The recommended pattern for Android's equivalent to cron jobs and Windows scheduled tasks is to use AlarmManager. This works well when coupled with an IntentService, as the service will do its work on a background thread and shut down when there is no more work to do.

There's one small problem: IntentService does nothing to keep the device awake. If the alarm was a WAKEUP variant, the phone will only stay awake on its own while the BroadcastReceiver handling the alarm is in its onReceive() method. Otherwise, the phone may fall back asleep.

I have checked in my fix to use a regular Service instead of an IntentService, and not hit this for a while. Fingers crossed...

@shankari
Copy link
Contributor Author

shankari commented Oct 2, 2015

Should be fixed by ec63703

@shankari shankari closed this as completed Oct 2, 2015
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants