You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Dequeue handling is complicated for sched_ext because it's the only case where the task ownership needs to be taken away from the BPF scheduler. The usual scheduling flow is that once a task wakes up and gets queued on the BPF scheduler (whether that's on a BPF data structure or a custom DSQ), the BPF scheduler has full ownership of the task until the scheduler decides to dispatch the task. This clean transition of ownership makes it unnecessary to couple locking on the kernel side (per-rq locking) and whatever synchronization scheme used by the BPF scheduler.
However, dequeue doesn't follow this rule. When a task's property needs to be changed, the task is forcibly dequeued from the scheduler and re-enqueued after the property change. This can happen anytime and can't be denied or delayed. To support this while allowing hot-migration (e.g. for sharing a DSQ across multiple CPUs), sched_ext does a bit of synchronization dancing using p->scx.holding_cpu. This combined with other factors also allows dispatches to be opportunistic - a BPF scheduler is allowed to dispatch a task which it doesn't currently own. sched_ext core will ignore such dispatches and simply try again. This in turn allows BPF schedulers to ignore and not implement ops.dequeue(). In turn, ops.dequeue() as currently implemented is only called when the sched_ext core knows the task to be on a BPF data structure.
While not implementing ops.dequeue() works fine most of the time, there are cases where the BPF scheduler wants to track the ownership and property transitions and the current implementation of ops.dequeue() doesn't support it. For example, scx_layered wants to track per-DSQ queued average runtime sum and a straight-forward way to do that would be adding when being enqueued and subtracting when the task is either dispatched or dequeued. However, because ops.dequeue() is not called when a task is on a DSQ, there's no way to detect dequeues. Instead, scx_layered depends on detecting back-to-back enqueues instead: https://github.com/sched-ext/scx/blob/main/scheds/rust/scx_layered/src/bpf/main.bpf.c#L1177
Update ops.dequeue() so that an enqueued task is always either dispatched (moved to a local or global DSQ) or dequeued.
The text was updated successfully, but these errors were encountered:
Dequeue handling is complicated for sched_ext because it's the only case where the task ownership needs to be taken away from the BPF scheduler. The usual scheduling flow is that once a task wakes up and gets queued on the BPF scheduler (whether that's on a BPF data structure or a custom DSQ), the BPF scheduler has full ownership of the task until the scheduler decides to dispatch the task. This clean transition of ownership makes it unnecessary to couple locking on the kernel side (per-rq locking) and whatever synchronization scheme used by the BPF scheduler.
However, dequeue doesn't follow this rule. When a task's property needs to be changed, the task is forcibly dequeued from the scheduler and re-enqueued after the property change. This can happen anytime and can't be denied or delayed. To support this while allowing hot-migration (e.g. for sharing a DSQ across multiple CPUs), sched_ext does a bit of synchronization dancing using
p->scx.holding_cpu
. This combined with other factors also allows dispatches to be opportunistic - a BPF scheduler is allowed to dispatch a task which it doesn't currently own. sched_ext core will ignore such dispatches and simply try again. This in turn allows BPF schedulers to ignore and not implementops.dequeue()
. In turn,ops.dequeue()
as currently implemented is only called when the sched_ext core knows the task to be on a BPF data structure.While not implementing
ops.dequeue()
works fine most of the time, there are cases where the BPF scheduler wants to track the ownership and property transitions and the current implementation ofops.dequeue()
doesn't support it. For example,scx_layered
wants to track per-DSQ queued average runtime sum and a straight-forward way to do that would be adding when being enqueued and subtracting when the task is either dispatched or dequeued. However, becauseops.dequeue()
is not called when a task is on a DSQ, there's no way to detect dequeues. Instead,scx_layered
depends on detecting back-to-back enqueues instead: https://github.com/sched-ext/scx/blob/main/scheds/rust/scx_layered/src/bpf/main.bpf.c#L1177Update
ops.dequeue()
so that an enqueued task is always either dispatched (moved to a local or global DSQ) or dequeued.The text was updated successfully, but these errors were encountered: