Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

target/riscv: Ensure to handle all triggered a halt events #1171

Open
wants to merge 1 commit into
base: riscv
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
53 changes: 53 additions & 0 deletions src/target/riscv/riscv.c
Original file line number Diff line number Diff line change
Expand Up @@ -37,6 +37,8 @@

#define RISCV_TRIGGER_HIT_NOT_FOUND ((int64_t)-1)

#define RISCV_HALT_GROUP_REPOLL_LIMIT 5

static uint8_t ir_dtmcontrol[4] = {DTMCONTROL};
struct scan_field select_dtmcontrol = {
.in_value = NULL,
Expand Down Expand Up @@ -3722,6 +3724,8 @@ int riscv_openocd_poll(struct target *target)
{
LOG_TARGET_DEBUG(target, "Polling all harts.");

struct riscv_info *i = riscv_info(target);

struct list_head *targets;

LIST_HEAD(single_target_list);
Expand All @@ -3743,6 +3747,7 @@ int riscv_openocd_poll(struct target *target)
unsigned int should_resume = 0;
unsigned int halted = 0;
unsigned int running = 0;
unsigned int cause_groups = 0;
struct target_list *entry;
foreach_smp_target(entry, targets) {
struct target *t = entry->target;
Expand Down Expand Up @@ -3790,6 +3795,52 @@ int riscv_openocd_poll(struct target *target)
LOG_TARGET_DEBUG(target, "resume all");
riscv_resume(target, true, 0, 0, 0, false);
} else if (halted && running) {
LOG_TARGET_DEBUG(target, "SMP group is in inconsistent state: %u halted, %u running",
halted, running);

/* The SMP group is in an inconsistent state - some harts in the group have halted
* whereas others are running. The reasons for that (and corresponding
* OpenOCD actions) could be:
* 1) The targets are in the process of halting due to halt groups
* but not all of them halted --> poll again so that the halt reason of every
* hart can be accurately determined (e.g. semihosting).
* 2) The targets do not support halt groups --> OpenOCD must halt
* the remaining harts by a standard halt request.
* 3) The hart states got out of sync for some other unknown reason (problem?). -->
* Same as previous - try to halt the harts by a standard halt request
* to get them back in sync. */

/* Detect if the harts are just in the process of halting due to a halt group */
foreach_smp_target(entry, targets)
{
struct target *t = entry->target;
if (t->state == TARGET_HALTED) {
riscv_reg_t dcsr;
if (riscv_reg_get(t, &dcsr, GDB_REGNO_DCSR) != ERROR_OK)
return ERROR_FAIL;
if (get_field(dcsr, CSR_DCSR_CAUSE) == CSR_DCSR_CAUSE_GROUP)
cause_groups++;
else
/* This hart has halted due to something else than a halt group.
* Don't continue checking the rest - exit early. */
break;
}
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This loop seems weird. What is it's purpose? DCSR is writable, so we can occasionally trick debugger into wrong conclusion.

halt groups are an optional feature and I'm quite confused that we don't check for it.

Could you please provide a test scenario to reproduce your issue? Is it possible to use spike to model it? Or do you need a specific HW ?

Copy link
Contributor Author

@lz-bro lz-bro Nov 20, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This bug was discovered when I was testing Semihosting on our hardware and smp is enable. It fails when the following happens:
If all current halted states are due to a halt group, and other harts state was running. In fact, there was a hart halted, which caused the other harts to halt because of the hart group.

} else if (halted && running) {
LOG_TARGET_DEBUG(target, "halt all; halted=%d",
halted);
riscv_halt(target);
} else {

If there is such a halted hart,but the record status is running,it would not process riscv_semihosting.
if (halt_reason == RISCV_HALT_EBREAK) {
int retval;
/* Detect if this EBREAK is a semihosting request. If so, handle it. */
switch (riscv_semihosting(target, &retval)) {
case SEMIHOSTING_NONE:
break;
case SEMIHOSTING_WAITING:
/* This hart should remain halted. */
*next_action = RPH_REMAIN_HALTED;
break;
case SEMIHOSTING_HANDLED:
/* This hart should be resumed, along with any other
* harts that halted due to haltgroups. */
*next_action = RPH_RESUME;
return ERROR_OK;
case SEMIHOSTING_ERROR:
return retval;
}
}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@aap-sc I think this is a bug, would you provide some suggestions?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@lz-bro I'm still trying to understand your reasoning and what the issue is exactly (the situation is still not quite obvious to me). It will take a couple of days - I'll ask additional question if necessary.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe I have figured out the code thanks to the detailed description that @zqb-all has provided:

If the harts are just in the process of halting due to being members of the halt group, we should wait until they finish halting, so that the true halt reason can be discovered - for instance semihosting request - and then handled correctly.

if (halted == cause_groups) {
LOG_TARGET_DEBUG(target, "The harts appear to just be in the process of halting due to a halt group.");
if (i->halt_group_repoll_count < RISCV_HALT_GROUP_REPOLL_LIMIT) {
/* Wait a little, then re-poll. */
i->halt_group_repoll_count++;
alive_sleep(10);
LOG_TARGET_DEBUG(target, "Re-polling the state of the SMP group.");
return riscv_openocd_poll(target);
}
/* We have already re-polled multiple times but the halt group is still inconsistent. */
LOG_TARGET_DEBUG(target, "Re-polled the SMP group %d times it is still not in a consistent state.",
RISCV_HALT_GROUP_REPOLL_LIMIT);
}

/* Halting the whole SMP group to bring it in sync. */
LOG_TARGET_DEBUG(target, "halt all; halted=%d",
halted);
riscv_halt(target);
Expand All @@ -3807,6 +3858,8 @@ int riscv_openocd_poll(struct target *target)
}
}

i->halt_group_repoll_count = 0;

/* Call tick() for every hart. What happens in tick() is opaque to this
* layer. The reason it's outside the previous loop is that at this point
* the state of every hart has settled, so any side effects happening in
Expand Down
1 change: 1 addition & 0 deletions src/target/riscv/riscv.h
Original file line number Diff line number Diff line change
Expand Up @@ -193,6 +193,7 @@ struct riscv_info {
/* Used by riscv_openocd_poll(). */
bool halted_needs_event_callback;
enum target_event halted_callback_event;
unsigned int halt_group_repoll_count;

enum riscv_isrmasking_mode isrmask_mode;

Expand Down
Loading