You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
// Measure the poll start time. Note that we may end up polling other
// tasks under this measurement. In this case, the tasks came from the
// LIFO slot and are considered part of the current task for scheduling
// purposes. These tasks inherent the "parent"'s limits.
core.stats.start_poll();
// Make the core available to the runtime context
*self.core.borrow_mut() = Some(core);
// Run the task
coop::budget(|| {
task.run();
letmut lifo_polls = 0;
// As long as there is budget remaining and a task exists in the
// `lifo_slot`, then keep running.
loop{
// Check if we still have the core. If not, the core was stolen
// by another worker.
letmut core = matchself.core.borrow_mut().take(){
Some(core) => core,
None => {
// In this case, we cannot call `reset_lifo_enabled()`
// because the core was stolen. The stealer will handle
// that at the top of `Context::run`
returnErr(());
}
};
// Check for a task in the LIFO slot
let task = match core.lifo_slot.take(){
Some(task) => task,
None => {
self.reset_lifo_enabled(&mut core);
core.stats.end_poll();
returnOk(core);
}
};
if !coop::has_budget_remaining(){
core.stats.end_poll();
// Not enough budget left to run the LIFO task, push it to
// the back of the queue and return.
core.run_queue.push_back_or_overflow(
task,
&*self.worker.handle,
&mut core.stats,
);
// If we hit this point, the LIFO slot should be enabled.
// There is no need to reset it.
debug_assert!(core.lifo_enabled);
returnOk(core);
}
// Track that we are about to run a task from the LIFO slot.
lifo_polls += 1;
super::counters::inc_lifo_schedules();
// Disable the LIFO slot if we reach our limit
//
// In ping-ping style workloads where task A notifies task B,
// which notifies task A again, continuously prioritizing the
// LIFO slot can cause starvation as these two tasks will
// repeatedly schedule the other. To mitigate this, we limit the
// number of times the LIFO slot is prioritized.
if lifo_polls >= MAX_LIFO_POLLS_PER_TICK{
core.lifo_enabled = false;
super::counters::inc_lifo_capped();
}
// Run the LIFO task, then loop
*self.core.borrow_mut() = Some(core);
let task = self.worker.handle.shared.owned.assert_owner(task);
task.run();
Although the comment confirms this behavior, it seems very counter intuitive to merge poll times across multiple individual polls. this seems like it could produce misleading results and report high poll times when none actually exist.
This is not precisely a bug, I'm opening this issue to discuss whether this behavior is what we want.
The text was updated successfully, but these errors were encountered:
I looked at the commit where the comment was added.
It looks like the poll time measurement is conflating two things. First, the actual task poll time that the end user sees and second, the time the scheduler spends in a task "critical section" i.e. the time between points when the scheduler is free to do maintenance work like poll the I/O driver. The LIFO slot optimization groups multiple tasks into a single "unit" to improve ping-pong style messaging patterns and the scheduler treats it as one "critical section".
The task poll time histogram should report true task poll times, so we probably do want to rework this to decouple those two concepts.
tokio/tokio/src/runtime/scheduler/multi_thread/worker.rs
Lines 585 to 659 in a82bdee
Although the comment confirms this behavior, it seems very counter intuitive to merge poll times across multiple individual polls. this seems like it could produce misleading results and report high poll times when none actually exist.
This is not precisely a bug, I'm opening this issue to discuss whether this behavior is what we want.
The text was updated successfully, but these errors were encountered: