`poll` time in the poll time histogram merges multiple polls #7106

rcoh · 2025-01-17T20:16:25Z

tokio/tokio/src/runtime/scheduler/multi_thread/worker.rs

Lines 585 to 659 in a82bdee

    
           // Measure the poll start time. Note that we may end up polling other 
        
           // tasks under this measurement. In this case, the tasks came from the 
        
           // LIFO slot and are considered part of the current task for scheduling 
        
           // purposes. These tasks inherent the "parent"'s limits. 
        
           core.stats.start_poll(); 
        
           // Make the core available to the runtime context 
        
           *self.core.borrow_mut() = Some(core); 
        
           // Run the task 
        
           coop::budget(|| { 
        
               task.run(); 
        
               let mut lifo_polls = 0; 
        
               // As long as there is budget remaining and a task exists in the 
        
               // `lifo_slot`, then keep running. 
        
               loop { 
        
                   // Check if we still have the core. If not, the core was stolen 
        
                   // by another worker. 
        
                   let mut core = match self.core.borrow_mut().take() { 
        
                       Some(core) => core, 
        
                       None => { 
        
                           // In this case, we cannot call `reset_lifo_enabled()` 
        
                           // because the core was stolen. The stealer will handle 
        
                           // that at the top of `Context::run` 
        
                           return Err(()); 
        
                       } 
        
                   }; 
        
                   // Check for a task in the LIFO slot 
        
                   let task = match core.lifo_slot.take() { 
        
                       Some(task) => task, 
        
                       None => { 
        
                           self.reset_lifo_enabled(&mut core); 
        
                           core.stats.end_poll(); 
        
                           return Ok(core); 
        
                       } 
        
                   }; 
        
                   if !coop::has_budget_remaining() { 
        
                       core.stats.end_poll(); 
        
                       // Not enough budget left to run the LIFO task, push it to 
        
                       // the back of the queue and return. 
        
                       core.run_queue.push_back_or_overflow( 
        
                           task, 
        
                           &*self.worker.handle, 
        
                           &mut core.stats, 
        
                       ); 
        
                       // If we hit this point, the LIFO slot should be enabled. 
        
                       // There is no need to reset it. 
        
                       debug_assert!(core.lifo_enabled); 
        
                       return Ok(core); 
        
                   } 
        
                   // Track that we are about to run a task from the LIFO slot. 
        
                   lifo_polls += 1; 
        
                   super::counters::inc_lifo_schedules(); 
        
                   // Disable the LIFO slot if we reach our limit 
        
                   // 
        
                   // In ping-ping style workloads where task A notifies task B, 
        
                   // which notifies task A again, continuously prioritizing the 
        
                   // LIFO slot can cause starvation as these two tasks will 
        
                   // repeatedly schedule the other. To mitigate this, we limit the 
        
                   // number of times the LIFO slot is prioritized. 
        
                   if lifo_polls >= MAX_LIFO_POLLS_PER_TICK { 
        
                       core.lifo_enabled = false; 
        
                       super::counters::inc_lifo_capped(); 
        
                   } 
        
                   // Run the LIFO task, then loop 
        
                   *self.core.borrow_mut() = Some(core); 
        
                   let task = self.worker.handle.shared.owned.assert_owner(task); 
        
                   task.run();

Although the comment confirms this behavior, it seems very counter intuitive to merge poll times across multiple individual polls. this seems like it could produce misleading results and report high poll times when none actually exist.

This is not precisely a bug, I'm opening this issue to discuss whether this behavior is what we want.

carllerche · 2025-01-22T18:12:41Z

I looked at the commit where the comment was added.

It looks like the poll time measurement is conflating two things. First, the actual task poll time that the end user sees and second, the time the scheduler spends in a task "critical section" i.e. the time between points when the scheduler is free to do maintenance work like poll the I/O driver. The LIFO slot optimization groups multiple tasks into a single "unit" to improve ping-pong style messaging patterns and the scheduler treats it as one "critical section".

The task poll time histogram should report true task poll times, so we probably do want to rework this to decouple those two concepts.

rcoh added A-tokio Area: The main tokio crate C-bug Category: This is a bug. labels Jan 17, 2025

Darksonn added the M-metrics Module: tokio/runtime/metrics label Jan 22, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

`poll` time in the poll time histogram merges multiple polls #7106

`poll` time in the poll time histogram merges multiple polls #7106

rcoh commented Jan 17, 2025

carllerche commented Jan 22, 2025

poll time in the poll time histogram merges multiple polls #7106

poll time in the poll time histogram merges multiple polls #7106

Comments

rcoh commented Jan 17, 2025

carllerche commented Jan 22, 2025

`poll` time in the poll time histogram merges multiple polls #7106

`poll` time in the poll time histogram merges multiple polls #7106