Skip to content

Commit

Permalink
wmediumd: lib: sched: fix another scheduling corner case
Browse files Browse the repository at this point in the history
When running with an external scheduler that also uses the
event loop, we can detect e.g. a client disconnecting from
a server while in usfstl_sched_forward(), causing us to not
have a job anymore on the scheduler afterwards, which then
causes the assert at the end to get reached erroneously.

Move the job check and external wait into the loop so these
cases are covered correctly.

This actually happened in wmediumd on client disconnect at
this exact time, while running usfstl_sched_forward().
  • Loading branch information
jmberg-intel authored and bcopeland committed Mar 2, 2021
1 parent de6f118 commit 717e5d7
Showing 1 changed file with 33 additions and 15 deletions.
48 changes: 33 additions & 15 deletions wmediumd/lib/sched.c
Original file line number Diff line number Diff line change
Expand Up @@ -182,22 +182,38 @@ void usfstl_sched_start(struct usfstl_scheduler *sched)

struct usfstl_job *usfstl_sched_next(struct usfstl_scheduler *sched)
{
struct usfstl_job *job;

/*
* If external scheduler is active, we might get here with nothing
* to do, so we just need to wait for an external input/job which
* will add an job to our scheduler in usfstl_sched_add_job().
*
* And due to the fact that we don't have any API for canceling external
* time request, we can request external time which adds a job on the
* external scheduler and cancel internally, and get scheduled to run
* with nothing to do.
*/
while (usfstl_list_empty(&sched->joblist) && sched->external_request)
usfstl_sched_external_wait(sched);
while (true) {
struct usfstl_job *job = usfstl_sched_next_pending(sched, NULL);

if (!job) {
/*
* If external scheduler is active, we might get here
* with nothing to do, so we just need to wait for an
* external input/job which will add a job to our
* scheduler.
*
* Due to the fact that we don't have any API for
* cancelling external time requests, we might have
* requested time from the external scheduler for a
* job that subsequently got removed, ending up here
* without a job, or one further in the future which
* would cause usfstl_sched_forward() to wait again.
*
* Additionally, we might only remove the job we just
* found during the usfstl_sched_forward() below, if
* that causes the main loop to run and we detect an
* event that causes a job removal (such as a client
* disconnecting from a server), so the job pointer we
* have might go stale. Hence, all of this needs to be
* checked in the overall loop.
*/
if (sched->external_request) {
usfstl_sched_external_wait(sched);
continue;
}
break;
}

while ((job = usfstl_sched_next_pending(sched, NULL))) {
/*
* Forward, but only if job isn't in the past - this
* can happen if some job was inserted while we
Expand All @@ -214,6 +230,8 @@ struct usfstl_job *usfstl_sched_next(struct usfstl_scheduler *sched)
* might have inserted an earlier job into the timeline.
* If it's not this job's turn yet, reinsert it and check
* what's up next in the next loop iteration.
*
* Also, 'job' might now have been removed, see above.
*/
if (usfstl_sched_next_pending(sched, NULL) != job)
continue;
Expand Down

0 comments on commit 717e5d7

Please sign in to comment.