Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

virtme-ng: return 255 on panic in script mode instead of hanging #70

Merged
merged 1 commit into from
Feb 14, 2024

Conversation

arighi
Copy link
Owner

@arighi arighi commented Feb 13, 2024

If we trigger a kernel panic when running in script mode the guest would just hangs indefinitely.

Test case:

$ vng -vr -- "echo c > /proc/sysrq-trigger"

This can be a bit problematic in a CI scenario, since the indefinite hang can block all the next tests or actions.

A possible solution is to use the timeout command with vng, but we may also want to distinguish actual timeout conditions from kernel panics.

For this reason always boot the guest with "panic=-1" when running in script mode; this, together with qemu-no-reboot will trigger an immediate exit, that can be detected by vng and report the special exit code 255.

This allows to explicitly catch kernel panic conditions, by checking if the return code of vng is 255.

Example:

$ vng -vr -- "echo c > /proc/sysrq-trigger" 2>/tmp/kernel.log
$ [ $? == 255 ] && grep "Kernel panic" /tmp/kernel.log
[ 3.260927] Kernel panic - not syncing: sysrq triggered crash

NOTE: we don't want to change the behavior with interactive mode. In that case it is fine to hang indefinitely by default, because we may want to attach a debugger, or trigger a memory dump, etc. So this change should only affect script mode for now.

Link: linux-netdev/nipa#11

If we trigger a kernel panic when running in script mode the guest would
just hangs indefinitely.

Test case:

 $ vng -vr -- "echo c > /proc/sysrq-trigger"

This can be a bit problematic in a CI scenario, since the indefinite
hang can block all the next tests or actions.

A possible solution is to use the `timeout` command with `vng`, but we
may also want to distinguish actual timeout conditions from kernel
panics.

For this reason always boot the guest with "panic=-1" when running in
script mode; this, together with qemu`-no-reboot` will trigger an
immediate exit, that can be detected by vng and report the special exit
code 255.

This allows to explicitly catch kernel panic conditions, by checking if
the return code of vng is 255.

Example:

 $ vng -vr -- "echo c > /proc/sysrq-trigger" 2>/tmp/kernel.log
 $ [ $? == 255 ] && grep "Kernel panic" /tmp/kernel.log
 [    3.260927] Kernel panic - not syncing: sysrq triggered crash

NOTE: we don't want to change the behavior with interactive mode. In
that case it is fine to hang indefinitely by default, because we may
want to attach a debugger, or trigger a memory dump, etc. So this change
should only affect script mode for now.

Link: linux-netdev/nipa#11
Signed-off-by: Andrea Righi <[email protected]>
@arighi arighi force-pushed the detect-kernel-panic branch from d48f841 to 838cca0 Compare February 13, 2024 22:39
@arighi arighi merged commit a7025ba into main Feb 14, 2024
8 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant