Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

"unhalted reads" is a misfeature and needs to be reworked/removed #527

Closed
cbiffle opened this issue Dec 22, 2024 · 0 comments · Fixed by #532
Closed

"unhalted reads" is a misfeature and needs to be reworked/removed #527

cbiffle opened this issue Dec 22, 2024 · 0 comments · Fixed by #532

Comments

@cbiffle
Copy link
Contributor

cbiffle commented Dec 22, 2024

I spent the morning trying to bring up Hubris on the STM32U575 for funsies. The port was easy enough, but I kept having issues with Humility reading the archive ID. Or sometimes the tasks table. Or sometimes something else.

I noticed that it was complaining that the archive ID read as zeros, which is suspicious. By going in behind its back with openocd and forcing a halt, I was able to get it to read the archive ID correctly. I verified through openocd that reading flash while the processor is running returns zeros.

Why would Humility be reading flash while the processor is running? Haven't I filed a dozen bugs over the years about inconsistent reads racing the processor?

I dug in and found this:

pub fn unhalted_reads(&self) -> bool {
if let Some(ref target) = self.manifest.target {
target != "thumbv6m-none-eabi"
} else {
false
}
}

In other words, the core is silently turning halt_and_read into just_read_racing_the_processor on anything that isn't a v6m part. This is presumably because the first place halting reads loudly broke was on STM32G0, but they're subtly broken on all of our processors. Whether an unhalted read to an address range might succeed depends on several factors:

  1. Vendor implementation decisions. STM32H7 racing reads to flash work; STM32U5 they do not.
  2. What the program is doing. This has been a perennial problem in the task table, where you're basically always racing a mutator (the kernel) resulting in inconsistent state if the system is even lightly loaded.
  3. Debug probe timing. Different debug probes do things faster or slower; one may appear to succeed when another fails consistently.

This feature needs to be rethought or, more likely, removed. I have disabled it locally and have the STM32U5 working on my fork. I suspect this exists exclusively to allow humility dashboard to appear to run unintrusively, but IMO if humility dashboard really wants inconsistent reads, it should be using an inconsistent read operation and spare all the other commands.

I suspect that this feature is responsible for #526 but I haven't proven it yet.

cbiffle added a commit to cbiffle/humility that referenced this issue Dec 22, 2024
This fixes oxidecomputer#527 enough to be able to use the STM32U575. I am still
testing to see whether I can also get consistent reads on the L4 now.
cbiffle added a commit to cbiffle/humility that referenced this issue Jan 25, 2025
This fixes oxidecomputer#527 enough to be able to use the STM32U575. I am still
testing to see whether I can also get consistent reads on the L4 now.
cbiffle added a commit to cbiffle/humility that referenced this issue Jan 25, 2025
I've been tracking down reasons why Humility sometimes gives an
inconsistent snapshot of system state in things like the `tasks` output,
and I've found one of the culprits.

There's this `halt_and_read` function that we use when we want a
consistent snapshot. Uses of this function often carry comments like "We
read the entire [thing being read] at a go to get as consistent a
snapshot as possible."

Unfortunately, despite its name, `halt_and_read` doesn't necessarily
halt, and this decision is not in control of the caller or obvious to
readers of the calling code. The decision to halt is a little complex
and depends on...

- The target triple -- after discovering an M0 that doesn't support
  unhalted reads to flash, we appear to have opted out the entire
  thumbv6m architecture from this meddling.

- The exact dynamic call path to reach this point and whether anyone has
  called halt(), thereby incrementing the `halted` variable (the halt()
  call _does_ in fact halt; the `op_start` call is similarly shaped but
  only _maybe_ halts)

This behavior has been responsible for a number of bugs I keep hitting
when trying to debug _other_ bugs, which is the worst time to be hitting
such bugs. This includes

- Intermittent failures in archive ID matching on the STM32U5 series,
  where debugger flash reads are not supported while the CPU is
  executing (as on the STM32G0, but unlike the STM32H7).

- Task status output that is inconsistent, e.g. no tasks showing as
  runnable or multiple tasks ready but only the idle task actually
  running. Both of these have had me convinced that we had scheduler
  bugs; so far it does not appear that we have scheduler bugs, it's just
  the debugger misrepresenting things.

Aside: remember that whether debugger accesses to buses are allowed to
race the CPU is a _vendor dependent decision_ that varies from SoC to
SoC. It is not consistent on all ARMv6/7/8-M parts; it is not consistent
on all parts from ST; it is not consistent on all parts built around the
Cortex-M33; it is not consistent across flash, RAM, and peripherals in a
single part. Because commercial tools generally don't _do_ this, vendors
don't document the specific behavior, and you get to discover it by
things failing. In short, doing unhalted reads of anything is asking for
flake and should only be done in very specific circumstances.

This commit takes a somewhat heavy-handed approach and simply removes
the secret unhalted reads behind `halt_and_read`. There may be
situations where we legitimately want unhalted reads -- humility
dashboard comes to mind -- and IMO these commands should be using an
explicit `read_without_halting` operation that is defined as racing, and
therefore as "maybe doesn't work on your chip." I have not done that
here.

Fixes oxidecomputer#527.
cbiffle added a commit to cbiffle/humility that referenced this issue Jan 25, 2025
I've been tracking down reasons why Humility sometimes gives an
inconsistent snapshot of system state in things like the `tasks` output,
and I've found one of the culprits.

There's this `halt_and_read` function that we use when we want a
consistent snapshot. Uses of this function often carry comments like "We
read the entire [thing being read] at a go to get as consistent a
snapshot as possible."

Unfortunately, despite its name, `halt_and_read` doesn't necessarily
halt, and this decision is not in control of the caller or obvious to
readers of the calling code. The decision to halt is a little complex
and depends on...

- The target triple -- after discovering an M0 that doesn't support
  unhalted reads to flash, we appear to have opted out the entire
  thumbv6m architecture from this meddling.

- The exact dynamic call path to reach this point and whether anyone has
  called halt(), thereby incrementing the `halted` variable (the halt()
  call _does_ in fact halt; the `op_start` call is similarly shaped but
  only _maybe_ halts)

This behavior has been responsible for a number of bugs I keep hitting
when trying to debug _other_ bugs, which is the worst time to be hitting
such bugs. This includes

- Intermittent failures in archive ID matching on the STM32U5 series,
  where debugger flash reads are not supported while the CPU is
  executing (as on the STM32G0, but unlike the STM32H7).

- Task status output that is inconsistent, e.g. no tasks showing as
  runnable or multiple tasks ready but only the idle task actually
  running. Both of these have had me convinced that we had scheduler
  bugs; so far it does not appear that we have scheduler bugs, it's just
  the debugger misrepresenting things.

Aside: remember that whether debugger accesses to buses are allowed to
race the CPU is a _vendor dependent decision_ that varies from SoC to
SoC. It is not consistent on all ARMv6/7/8-M parts; it is not consistent
on all parts from ST; it is not consistent on all parts built around the
Cortex-M33; it is not consistent across flash, RAM, and peripherals in a
single part. Because commercial tools generally don't _do_ this, vendors
don't document the specific behavior, and you get to discover it by
things failing. In short, doing unhalted reads of anything is asking for
flake and should only be done in very specific circumstances.

This commit takes a somewhat heavy-handed approach and simply removes
the secret unhalted reads behind `halt_and_read`. There may be
situations where we legitimately want unhalted reads -- humility
dashboard comes to mind -- and IMO these commands should be using an
explicit `read_without_halting` operation that is defined as racing, and
therefore as "maybe doesn't work on your chip." I have not done that
here.

Fixes oxidecomputer#527.
cbiffle added a commit to cbiffle/humility that referenced this issue Jan 25, 2025
I've been tracking down reasons why Humility sometimes gives an
inconsistent snapshot of system state in things like the `tasks` output,
and I've found one of the culprits.

There's this `halt_and_read` function that we use when we want a
consistent snapshot. Uses of this function often carry comments like "We
read the entire [thing being read] at a go to get as consistent a
snapshot as possible."

Unfortunately, despite its name, `halt_and_read` doesn't necessarily
halt, and this decision is not in control of the caller or obvious to
readers of the calling code. The decision to halt is a little complex
and depends on...

- The target triple -- after discovering an M0 that doesn't support
  unhalted reads to flash, we appear to have opted out the entire
  thumbv6m architecture from this meddling.

- The exact dynamic call path to reach this point and whether anyone has
  called halt(), thereby incrementing the `halted` variable (the halt()
  call _does_ in fact halt; the `op_start` call is similarly shaped but
  only _maybe_ halts)

This behavior has been responsible for a number of bugs I keep hitting
when trying to debug _other_ bugs, which is the worst time to be hitting
such bugs. This includes

- Intermittent failures in archive ID matching on the STM32U5 series,
  where debugger flash reads are not supported while the CPU is
  executing (as on the STM32G0, but unlike the STM32H7).

- Task status output that is inconsistent, e.g. no tasks showing as
  runnable or multiple tasks ready but only the idle task actually
  running. Both of these have had me convinced that we had scheduler
  bugs; so far it does not appear that we have scheduler bugs, it's just
  the debugger misrepresenting things.

Aside: remember that whether debugger accesses to buses are allowed to
race the CPU is a _vendor dependent decision_ that varies from SoC to
SoC. It is not consistent on all ARMv6/7/8-M parts; it is not consistent
on all parts from ST; it is not consistent on all parts built around the
Cortex-M33; it is not consistent across flash, RAM, and peripherals in a
single part. Because commercial tools generally don't _do_ this, vendors
don't document the specific behavior, and you get to discover it by
things failing. In short, doing unhalted reads of anything is asking for
flake and should only be done in very specific circumstances.

This commit takes a somewhat heavy-handed approach and simply removes
the secret unhalted reads behind `halt_and_read`. There may be
situations where we legitimately want unhalted reads -- humility
dashboard comes to mind -- and IMO these commands should be using an
explicit `read_without_halting` operation that is defined as racing, and
therefore as "maybe doesn't work on your chip." I have not done that
here.

Fixes oxidecomputer#527.
cbiffle added a commit to cbiffle/humility that referenced this issue Jan 25, 2025
I've been tracking down reasons why Humility sometimes gives an
inconsistent snapshot of system state in things like the `tasks` output,
and I've found one of the culprits.

There's this `halt_and_read` function that we use when we want a
consistent snapshot. Uses of this function often carry comments like "We
read the entire [thing being read] at a go to get as consistent a
snapshot as possible."

Unfortunately, despite its name, `halt_and_read` doesn't necessarily
halt, and this decision is not in control of the caller or obvious to
readers of the calling code. The decision to halt is a little complex
and depends on...

- The target triple -- after discovering an M0 that doesn't support
  unhalted reads to flash, we appear to have opted out the entire
  thumbv6m architecture from this meddling.

- The exact dynamic call path to reach this point and whether anyone has
  called halt(), thereby incrementing the `halted` variable (the halt()
  call _does_ in fact halt; the `op_start` call is similarly shaped but
  only _maybe_ halts)

This behavior has been responsible for a number of bugs I keep hitting
when trying to debug _other_ bugs, which is the worst time to be hitting
such bugs. This includes

- Intermittent failures in archive ID matching on the STM32U5 series,
  where debugger flash reads are not supported while the CPU is
  executing (as on the STM32G0, but unlike the STM32H7).

- Task status output that is inconsistent, e.g. no tasks showing as
  runnable or multiple tasks ready but only the idle task actually
  running. Both of these have had me convinced that we had scheduler
  bugs; so far it does not appear that we have scheduler bugs, it's just
  the debugger misrepresenting things.

Aside: remember that whether debugger accesses to buses are allowed to
race the CPU is a _vendor dependent decision_ that varies from SoC to
SoC. It is not consistent on all ARMv6/7/8-M parts; it is not consistent
on all parts from ST; it is not consistent on all parts built around the
Cortex-M33; it is not consistent across flash, RAM, and peripherals in a
single part. Because commercial tools generally don't _do_ this, vendors
don't document the specific behavior, and you get to discover it by
things failing. In short, doing unhalted reads of anything is asking for
flake and should only be done in very specific circumstances.

This commit takes a somewhat heavy-handed approach and simply removes
the secret unhalted reads behind `halt_and_read`. There may be
situations where we legitimately want unhalted reads -- humility
dashboard comes to mind -- and IMO these commands should be using an
explicit `read_without_halting` operation that is defined as racing, and
therefore as "maybe doesn't work on your chip." I have not done that
here.

Fixes oxidecomputer#527.
cbiffle added a commit to cbiffle/humility that referenced this issue Jan 27, 2025
I've been tracking down reasons why Humility sometimes gives an
inconsistent snapshot of system state in things like the `tasks` output,
and I've found one of the culprits.

There's this `halt_and_read` function that we use when we want a
consistent snapshot. Uses of this function often carry comments like "We
read the entire [thing being read] at a go to get as consistent a
snapshot as possible."

Unfortunately, despite its name, `halt_and_read` doesn't necessarily
halt, and this decision is not in control of the caller or obvious to
readers of the calling code. The decision to halt is a little complex
and depends on...

- The target triple -- after discovering an M0 that doesn't support
  unhalted reads to flash, we appear to have opted out the entire
  thumbv6m architecture from this meddling.

- The exact dynamic call path to reach this point and whether anyone has
  called halt(), thereby incrementing the `halted` variable (the halt()
  call _does_ in fact halt; the `op_start` call is similarly shaped but
  only _maybe_ halts)

This behavior has been responsible for a number of bugs I keep hitting
when trying to debug _other_ bugs, which is the worst time to be hitting
such bugs. This includes

- Intermittent failures in archive ID matching on the STM32U5 series,
  where debugger flash reads are not supported while the CPU is
  executing (as on the STM32G0, but unlike the STM32H7).

- Task status output that is inconsistent, e.g. no tasks showing as
  runnable or multiple tasks ready but only the idle task actually
  running. Both of these have had me convinced that we had scheduler
  bugs; so far it does not appear that we have scheduler bugs, it's just
  the debugger misrepresenting things.

Aside: remember that whether debugger accesses to buses are allowed to
race the CPU is a _vendor dependent decision_ that varies from SoC to
SoC. It is not consistent on all ARMv6/7/8-M parts; it is not consistent
on all parts from ST; it is not consistent on all parts built around the
Cortex-M33; it is not consistent across flash, RAM, and peripherals in a
single part. Because commercial tools generally don't _do_ this, vendors
don't document the specific behavior, and you get to discover it by
things failing. In short, doing unhalted reads of anything is asking for
flake and should only be done in very specific circumstances.

This commit takes a somewhat heavy-handed approach and simply removes
the secret unhalted reads behind `halt_and_read`. There may be
situations where we legitimately want unhalted reads -- humility
dashboard comes to mind -- and IMO these commands should be using an
explicit `read_without_halting` operation that is defined as racing, and
therefore as "maybe doesn't work on your chip." I have not done that
here.

Fixes oxidecomputer#527.
cbiffle added a commit to cbiffle/humility that referenced this issue Jan 27, 2025
I've been tracking down reasons why Humility sometimes gives an
inconsistent snapshot of system state in things like the `tasks` output,
and I've found one of the culprits.

There's this `halt_and_read` function that we use when we want a
consistent snapshot. Uses of this function often carry comments like "We
read the entire [thing being read] at a go to get as consistent a
snapshot as possible."

Unfortunately, despite its name, `halt_and_read` doesn't necessarily
halt, and this decision is not in control of the caller or obvious to
readers of the calling code. The decision to halt is a little complex
and depends on...

- The target triple -- after discovering an M0 that doesn't support
  unhalted reads to flash, we appear to have opted out the entire
  thumbv6m architecture from this meddling.

- The exact dynamic call path to reach this point and whether anyone has
  called halt(), thereby incrementing the `halted` variable (the halt()
  call _does_ in fact halt; the `op_start` call is similarly shaped but
  only _maybe_ halts)

This behavior has been responsible for a number of bugs I keep hitting
when trying to debug _other_ bugs, which is the worst time to be hitting
such bugs. This includes

- Intermittent failures in archive ID matching on the STM32U5 series,
  where debugger flash reads are not supported while the CPU is
  executing (as on the STM32G0, but unlike the STM32H7).

- Task status output that is inconsistent, e.g. no tasks showing as
  runnable or multiple tasks ready but only the idle task actually
  running. Both of these have had me convinced that we had scheduler
  bugs; so far it does not appear that we have scheduler bugs, it's just
  the debugger misrepresenting things.

Aside: remember that whether debugger accesses to buses are allowed to
race the CPU is a _vendor dependent decision_ that varies from SoC to
SoC. It is not consistent on all ARMv6/7/8-M parts; it is not consistent
on all parts from ST; it is not consistent on all parts built around the
Cortex-M33; it is not consistent across flash, RAM, and peripherals in a
single part. Because commercial tools generally don't _do_ this, vendors
don't document the specific behavior, and you get to discover it by
things failing. In short, doing unhalted reads of anything is asking for
flake and should only be done in very specific circumstances.

This commit takes a somewhat heavy-handed approach and simply removes
the secret unhalted reads behind `halt_and_read`. There may be
situations where we legitimately want unhalted reads -- humility
dashboard comes to mind -- and IMO these commands should be using an
explicit `read_without_halting` operation that is defined as racing, and
therefore as "maybe doesn't work on your chip." I have not done that
here.

Fixes oxidecomputer#527.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant