Skip to content

Commit

Permalink
Ensure that halt_and_read always halts, and reads
Browse files Browse the repository at this point in the history
I've been tracking down reasons why Humility sometimes gives an
inconsistent snapshot of system state in things like the `tasks` output,
and I've found one of the culprits.

There's this `halt_and_read` function that we use when we want a
consistent snapshot. Uses of this function often carry comments like "We
read the entire [thing being read] at a go to get as consistent a
snapshot as possible."

Unfortunately, despite its name, `halt_and_read` doesn't necessarily
halt, and this decision is not in control of the caller or obvious to
readers of the calling code. The decision to halt is a little complex
and depends on...

- The target triple -- after discovering an M0 that doesn't support
  unhalted reads to flash, we appear to have opted out the entire
  thumbv6m architecture from this meddling.

- The exact dynamic call path to reach this point and whether anyone has
  called halt(), thereby incrementing the `halted` variable (the halt()
  call _does_ in fact halt; the `op_start` call is similarly shaped but
  only _maybe_ halts)

This behavior has been responsible for a number of bugs I keep hitting
when trying to debug _other_ bugs, which is the worst time to be hitting
such bugs. This includes

- Intermittent failures in archive ID matching on the STM32U5 series,
  where debugger flash reads are not supported while the CPU is
  executing (as on the STM32G0, but unlike the STM32H7).

- Task status output that is inconsistent, e.g. no tasks showing as
  runnable or multiple tasks ready but only the idle task actually
  running. Both of these have had me convinced that we had scheduler
  bugs; so far it does not appear that we have scheduler bugs, it's just
  the debugger misrepresenting things.

Aside: remember that whether debugger accesses to buses are allowed to
race the CPU is a _vendor dependent decision_ that varies from SoC to
SoC. It is not consistent on all ARMv6/7/8-M parts; it is not consistent
on all parts from ST; it is not consistent on all parts built around the
Cortex-M33; it is not consistent across flash, RAM, and peripherals in a
single part. Because commercial tools generally don't _do_ this, vendors
don't document the specific behavior, and you get to discover it by
things failing. In short, doing unhalted reads of anything is asking for
flake and should only be done in very specific circumstances.

This commit takes a somewhat heavy-handed approach and simply removes
the secret unhalted reads behind `halt_and_read`. There may be
situations where we legitimately want unhalted reads -- humility
dashboard comes to mind -- and IMO these commands should be using an
explicit `read_without_halting` operation that is defined as racing, and
therefore as "maybe doesn't work on your chip." I have not done that
here.

Fixes oxidecomputer#527.
  • Loading branch information
cbiffle committed Jan 27, 2025
1 parent caa1370 commit 3e735cc
Show file tree
Hide file tree
Showing 7 changed files with 24 additions and 72 deletions.
3 changes: 1 addition & 2 deletions cmd/bankerase/src/lib.rs
Original file line number Diff line number Diff line change
Expand Up @@ -63,8 +63,7 @@ fn bankerasecmd(context: &mut ExecutionContext) -> Result<()> {
};

humility::msg!("attaching with chip set to {chip:x?}");
let mut c =
humility_probes_core::attach_for_flashing(probe, hubris, &chip)?;
let mut c = humility_probes_core::attach_for_flashing(probe, &chip)?;
let core = c.as_mut();

let ihex = tempfile::NamedTempFile::new()?;
Expand Down
3 changes: 1 addition & 2 deletions cmd/flash/src/lib.rs
Original file line number Diff line number Diff line change
Expand Up @@ -367,8 +367,7 @@ fn flashcmd(context: &mut ExecutionContext) -> Result<()> {
};

humility::msg!("attaching with chip set to {chip:x?}");
let mut c =
humility_probes_core::attach_for_flashing(probe, hubris, &chip)?;
let mut c = humility_probes_core::attach_for_flashing(probe, &chip)?;
let core = c.as_mut();

validate(hubris, core, &subargs)?;
Expand Down
2 changes: 1 addition & 1 deletion cmd/reset/src/lib.rs
Original file line number Diff line number Diff line change
Expand Up @@ -40,7 +40,7 @@ fn reset(context: &mut ExecutionContext) -> Result<()> {
"Need a chip to do a soft reset or halt after reset"
)
})?;
humility_probes_core::attach_to_chip(probe, hubris, Some(&chip))?
humility_probes_core::attach_to_chip(probe, Some(&chip))?
} else {
humility_probes_core::attach_to_probe(probe)?
};
Expand Down
17 changes: 1 addition & 16 deletions cmd/tasks/src/lib.rs
Original file line number Diff line number Diff line change
Expand Up @@ -269,7 +269,6 @@ pub fn print_tasks(
}

let mut tasks = vec![];
let mut panicked = false;
let mut regs = HashMap::new();

for i in 0..task_count {
Expand Down Expand Up @@ -300,18 +299,6 @@ pub fn print_tasks(
}

tasks.push((i, addr, task_value, task));

if let TaskState::Faulted { fault, .. } = task.state {
if fault == doppel::FaultInfo::Panic {
panicked = true;
}
}
}

let keep_halted = stack || registers || panicked;

if !keep_halted {
core.run()?;
}

writeln!(
Expand Down Expand Up @@ -480,9 +467,7 @@ pub fn print_tasks(
)?;
}

if keep_halted {
core.run()?;
}
core.run()?;

if task_arg.is_some() && !found {
bail!("\"{}\" is not a valid task", task_arg.unwrap());
Expand Down
8 changes: 0 additions & 8 deletions humility-core/src/hubris.rs
Original file line number Diff line number Diff line change
Expand Up @@ -3679,14 +3679,6 @@ impl HubrisArchive {
}
}

pub fn unhalted_reads(&self) -> bool {
if let Some(ref target) = self.manifest.target {
target != "thumbv6m-none-eabi"
} else {
false
}
}

/// Reads the auxiliary flash data from a Hubris archive
///
/// Returns `Ok(Some(...))` if the data is loaded, `Ok(None)` if the file
Expand Down
28 changes: 8 additions & 20 deletions humility-probes-core/src/lib.rs
Original file line number Diff line number Diff line change
Expand Up @@ -145,7 +145,6 @@ pub fn attach_to_probe(probe: &str) -> Result<Box<dyn Core>> {
#[rustfmt::skip::macros(anyhow, bail)]
pub fn attach_to_chip(
probe: &str,
hubris: &HubrisArchive,
chip: Option<&str>,
) -> Result<Box<dyn Core>> {
let (probe, index) = parse_probe(probe);
Expand Down Expand Up @@ -191,7 +190,6 @@ pub fn attach_to_chip(
probe_info.vendor_id,
probe_info.product_id,
probe_info.serial_number,
hubris.unhalted_reads(),
can_flash,
)))
}
Expand All @@ -209,15 +207,15 @@ pub fn attach_to_chip(
}

"auto" => {
if let Ok(probe) = attach_to_chip("ocd", hubris, chip) {
if let Ok(probe) = attach_to_chip("ocd", chip) {
return Ok(probe);
}

if let Ok(probe) = attach_to_chip("jlink", hubris, chip) {
if let Ok(probe) = attach_to_chip("jlink", chip) {
return Ok(probe);
}

attach_to_chip("usb", hubris, chip)
attach_to_chip("usb", chip)
}

"ocdgdb" => {
Expand Down Expand Up @@ -256,31 +254,21 @@ pub fn attach_to_chip(
crate::msg!("attached to {vidpid} via {name}");

Ok(Box::new(probe_rs::ProbeCore::new(
session,
name,
vid,
pid,
serial,
hubris.unhalted_reads(),
can_flash,
session, name, vid, pid, serial, can_flash,
)))
}
Err(_) => Err(anyhow!("unrecognized probe: {probe}")),
},
}
}

pub fn attach_for_flashing(
probe: &str,
hubris: &HubrisArchive,
chip: &str,
) -> Result<Box<dyn Core>> {
attach_to_chip(probe, hubris, Some(chip))
pub fn attach_for_flashing(probe: &str, chip: &str) -> Result<Box<dyn Core>> {
attach_to_chip(probe, Some(chip))
}

pub fn attach(probe: &str, hubris: &HubrisArchive) -> Result<Box<dyn Core>> {
match hubris.chip() {
Some(s) => attach_to_chip(probe, hubris, Some(&s)),
None => attach_to_chip(probe, hubris, None),
Some(s) => attach_to_chip(probe, Some(&s)),
None => attach_to_chip(probe, None),
}
}
35 changes: 12 additions & 23 deletions humility-probes-core/src/probe_rs.rs
Original file line number Diff line number Diff line change
Expand Up @@ -19,7 +19,6 @@ pub struct ProbeCore {
pub vendor_id: u16,
pub product_id: u16,
pub serial_number: Option<String>,
unhalted_reads: bool,
halted: u32,
unhalted_read: BTreeMap<u32, u32>,
can_flash: bool,
Expand All @@ -32,7 +31,6 @@ impl ProbeCore {
vendor_id: u16,
product_id: u16,
serial_number: Option<String>,
unhalted_reads: bool,
can_flash: bool,
) -> Self {
Self {
Expand All @@ -41,7 +39,6 @@ impl ProbeCore {
vendor_id,
product_id,
serial_number,
unhalted_reads,
halted: 0,
unhalted_read: humility_arch_arm::unhalted_read_regions(),
can_flash,
Expand All @@ -54,24 +51,20 @@ impl ProbeCore {
) -> Result<()> {
let mut core = self.session.core(0)?;

if self.unhalted_reads {
func(&mut core)
let halted = if self.halted == 0 && !core.core_halted()? {
core.halt(std::time::Duration::from_millis(1000))?;
true
} else {
let halted = if self.halted == 0 && !core.core_halted()? {
core.halt(std::time::Duration::from_millis(1000))?;
true
} else {
false
};

let rval = func(&mut core);
false
};

if halted {
core.run()?;
}
let rval = func(&mut core);

rval
if halted {
core.run()?;
}

rval
}
}

Expand Down Expand Up @@ -313,17 +306,13 @@ impl Core for ProbeCore {
}

fn op_start(&mut self) -> Result<()> {
if !self.unhalted_reads {
self.halt()?;
}
self.halt()?;

Ok(())
}

fn op_done(&mut self) -> Result<()> {
if !self.unhalted_reads {
self.run()?;
}
self.run()?;

Ok(())
}
Expand Down

0 comments on commit 3e735cc

Please sign in to comment.