"unhalted reads" is a misfeature and needs to be reworked/removed #527

cbiffle · 2024-12-22T17:11:58Z

I spent the morning trying to bring up Hubris on the STM32U575 for funsies. The port was easy enough, but I kept having issues with Humility reading the archive ID. Or sometimes the tasks table. Or sometimes something else.

I noticed that it was complaining that the archive ID read as zeros, which is suspicious. By going in behind its back with openocd and forcing a halt, I was able to get it to read the archive ID correctly. I verified through openocd that reading flash while the processor is running returns zeros.

Why would Humility be reading flash while the processor is running? Haven't I filed a dozen bugs over the years about inconsistent reads racing the processor?

I dug in and found this:

humility/humility-core/src/hubris.rs

Lines 3682 to 3688 in c0b8876

    
           pub fn unhalted_reads(&self) -> bool { 
        
               if let Some(ref target) = self.manifest.target { 
        
                   target != "thumbv6m-none-eabi" 
        
               } else { 
        
                   false 
        
               } 
        
           }

In other words, the core is silently turning halt_and_read into just_read_racing_the_processor on anything that isn't a v6m part. This is presumably because the first place halting reads loudly broke was on STM32G0, but they're subtly broken on all of our processors. Whether an unhalted read to an address range might succeed depends on several factors:

Vendor implementation decisions. STM32H7 racing reads to flash work; STM32U5 they do not.
What the program is doing. This has been a perennial problem in the task table, where you're basically always racing a mutator (the kernel) resulting in inconsistent state if the system is even lightly loaded.
Debug probe timing. Different debug probes do things faster or slower; one may appear to succeed when another fails consistently.

This feature needs to be rethought or, more likely, removed. I have disabled it locally and have the STM32U5 working on my fork. I suspect this exists exclusively to allow humility dashboard to appear to run unintrusively, but IMO if humility dashboard really wants inconsistent reads, it should be using an inconsistent read operation and spare all the other commands.

I suspect that this feature is responsible for #526 but I haven't proven it yet.

The text was updated successfully, but these errors were encountered:

This fixes oxidecomputer#527 enough to be able to use the STM32U575. I am still testing to see whether I can also get consistent reads on the L4 now.

I've been tracking down reasons why Humility sometimes gives an inconsistent snapshot of system state in things like the `tasks` output, and I've found one of the culprits. There's this `halt_and_read` function that we use when we want a consistent snapshot. Uses of this function often carry comments like "We read the entire [thing being read] at a go to get as consistent a snapshot as possible." Unfortunately, despite its name, `halt_and_read` doesn't necessarily halt, and this decision is not in control of the caller or obvious to readers of the calling code. The decision to halt is a little complex and depends on... - The target triple -- after discovering an M0 that doesn't support unhalted reads to flash, we appear to have opted out the entire thumbv6m architecture from this meddling. - The exact dynamic call path to reach this point and whether anyone has called halt(), thereby incrementing the `halted` variable (the halt() call _does_ in fact halt; the `op_start` call is similarly shaped but only _maybe_ halts) This behavior has been responsible for a number of bugs I keep hitting when trying to debug _other_ bugs, which is the worst time to be hitting such bugs. This includes - Intermittent failures in archive ID matching on the STM32U5 series, where debugger flash reads are not supported while the CPU is executing (as on the STM32G0, but unlike the STM32H7). - Task status output that is inconsistent, e.g. no tasks showing as runnable or multiple tasks ready but only the idle task actually running. Both of these have had me convinced that we had scheduler bugs; so far it does not appear that we have scheduler bugs, it's just the debugger misrepresenting things. Aside: remember that whether debugger accesses to buses are allowed to race the CPU is a _vendor dependent decision_ that varies from SoC to SoC. It is not consistent on all ARMv6/7/8-M parts; it is not consistent on all parts from ST; it is not consistent on all parts built around the Cortex-M33; it is not consistent across flash, RAM, and peripherals in a single part. Because commercial tools generally don't _do_ this, vendors don't document the specific behavior, and you get to discover it by things failing. In short, doing unhalted reads of anything is asking for flake and should only be done in very specific circumstances. This commit takes a somewhat heavy-handed approach and simply removes the secret unhalted reads behind `halt_and_read`. There may be situations where we legitimately want unhalted reads -- humility dashboard comes to mind -- and IMO these commands should be using an explicit `read_without_halting` operation that is defined as racing, and therefore as "maybe doesn't work on your chip." I have not done that here. Fixes oxidecomputer#527.

cbiffle added a commit to cbiffle/humility that referenced this issue Dec 22, 2024

humility-core: hackishly disable unhalted_reads

ede66a1

This fixes oxidecomputer#527 enough to be able to use the STM32U575. I am still testing to see whether I can also get consistent reads on the L4 now.

cbiffle added a commit to cbiffle/humility that referenced this issue Jan 25, 2025

humility-core: hackishly disable unhalted_reads

b2a0cd2

This fixes oxidecomputer#527 enough to be able to use the STM32U575. I am still testing to see whether I can also get consistent reads on the L4 now.

cbiffle mentioned this issue Jan 25, 2025

Remove almost all "unhalted reads" and make tasks display consistent #532

Merged

cbiffle closed this as completed in #532 Jan 27, 2025

cbiffle closed this as completed in 1b2e361 Jan 27, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

"unhalted reads" is a misfeature and needs to be reworked/removed #527

"unhalted reads" is a misfeature and needs to be reworked/removed #527

cbiffle commented Dec 22, 2024

"unhalted reads" is a misfeature and needs to be reworked/removed #527

"unhalted reads" is a misfeature and needs to be reworked/removed #527

Comments

cbiffle commented Dec 22, 2024