Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

poll time in the poll time histogram merges multiple polls #7106

Open
rcoh opened this issue Jan 17, 2025 · 1 comment
Open

poll time in the poll time histogram merges multiple polls #7106

rcoh opened this issue Jan 17, 2025 · 1 comment
Labels
A-tokio Area: The main tokio crate C-bug Category: This is a bug. M-metrics Module: tokio/runtime/metrics

Comments

@rcoh
Copy link
Contributor

rcoh commented Jan 17, 2025

// Measure the poll start time. Note that we may end up polling other
// tasks under this measurement. In this case, the tasks came from the
// LIFO slot and are considered part of the current task for scheduling
// purposes. These tasks inherent the "parent"'s limits.
core.stats.start_poll();
// Make the core available to the runtime context
*self.core.borrow_mut() = Some(core);
// Run the task
coop::budget(|| {
task.run();
let mut lifo_polls = 0;
// As long as there is budget remaining and a task exists in the
// `lifo_slot`, then keep running.
loop {
// Check if we still have the core. If not, the core was stolen
// by another worker.
let mut core = match self.core.borrow_mut().take() {
Some(core) => core,
None => {
// In this case, we cannot call `reset_lifo_enabled()`
// because the core was stolen. The stealer will handle
// that at the top of `Context::run`
return Err(());
}
};
// Check for a task in the LIFO slot
let task = match core.lifo_slot.take() {
Some(task) => task,
None => {
self.reset_lifo_enabled(&mut core);
core.stats.end_poll();
return Ok(core);
}
};
if !coop::has_budget_remaining() {
core.stats.end_poll();
// Not enough budget left to run the LIFO task, push it to
// the back of the queue and return.
core.run_queue.push_back_or_overflow(
task,
&*self.worker.handle,
&mut core.stats,
);
// If we hit this point, the LIFO slot should be enabled.
// There is no need to reset it.
debug_assert!(core.lifo_enabled);
return Ok(core);
}
// Track that we are about to run a task from the LIFO slot.
lifo_polls += 1;
super::counters::inc_lifo_schedules();
// Disable the LIFO slot if we reach our limit
//
// In ping-ping style workloads where task A notifies task B,
// which notifies task A again, continuously prioritizing the
// LIFO slot can cause starvation as these two tasks will
// repeatedly schedule the other. To mitigate this, we limit the
// number of times the LIFO slot is prioritized.
if lifo_polls >= MAX_LIFO_POLLS_PER_TICK {
core.lifo_enabled = false;
super::counters::inc_lifo_capped();
}
// Run the LIFO task, then loop
*self.core.borrow_mut() = Some(core);
let task = self.worker.handle.shared.owned.assert_owner(task);
task.run();

Although the comment confirms this behavior, it seems very counter intuitive to merge poll times across multiple individual polls. this seems like it could produce misleading results and report high poll times when none actually exist.

This is not precisely a bug, I'm opening this issue to discuss whether this behavior is what we want.

@rcoh rcoh added A-tokio Area: The main tokio crate C-bug Category: This is a bug. labels Jan 17, 2025
@Darksonn Darksonn added the M-metrics Module: tokio/runtime/metrics label Jan 22, 2025
@carllerche
Copy link
Member

I looked at the commit where the comment was added.

It looks like the poll time measurement is conflating two things. First, the actual task poll time that the end user sees and second, the time the scheduler spends in a task "critical section" i.e. the time between points when the scheduler is free to do maintenance work like poll the I/O driver. The LIFO slot optimization groups multiple tasks into a single "unit" to improve ping-pong style messaging patterns and the scheduler treats it as one "critical section".

The task poll time histogram should report true task poll times, so we probably do want to rework this to decouple those two concepts.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-tokio Area: The main tokio crate C-bug Category: This is a bug. M-metrics Module: tokio/runtime/metrics
Projects
None yet
Development

No branches or pull requests

3 participants