-
Notifications
You must be signed in to change notification settings - Fork 170
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
mantle/kola: Add function to enhance upgrade stability #3938
Conversation
mantle/kola/tests/upgrade/basic.go
Outdated
// | ||
// Note: if systemd-run ever gains the ability to --wait when | ||
// generating a path unit then the below can be simplified. | ||
c.RunCmdSync(m, "sudo systemd-run -u refchanged --path-property=PathChanged=/ostree/repo/refs/heads/ostree/1/1 systemctl stop wait.service") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The canonical API for this is /ostree/deploy
, which ostree touches whenever it changed the deployments exactly for the purpose of other things wanting to monitor changes:
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1
so we just need to change this to path-property=PathChanged=/ostree/deploy
then and update the comment accordingly?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, though I wonder if it's even necessary. Once the machine goes down for a reboot, it'll stop the wait
unit below, so you should already get the desired effect. I guess RunCmdSync
might mark that as an error though depending on how systemd-run
exits.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
right.. I think once the reboot actually starts it's hard to guarantee any info makes it back outside the VM (i.e. relying on network to still be up).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated the path to ostree/deploy as suggested here
@@ -328,6 +349,7 @@ func rpmostreeRebase(c cluster.TestCluster, m platform.Machine, ref, version str | |||
// we use systemd-run here so that we can test the --reboot path | |||
// without having SSH not exit cleanly, which would cause an error | |||
c.RunCmdSyncf(m, "sudo systemd-run rpm-ostree rebase --reboot %s", ref) | |||
waitForUpgradeToBeStaged(c, m) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure this makes sense here. In this case we're running rpm-ostree rebase synchronously. It'll have already done the deployment (and initiated a reboot) by the time you get to this line.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
not exactly.. systemd-run
is sending rpm-ostree
off on it's own little boat to finish independently IIUC.
mantle/kola/tests/upgrade/basic.go
Outdated
// | ||
// Note: if systemd-run ever gains the ability to --wait when | ||
// generating a path unit then the below can be simplified. | ||
c.RunCmdSync(m, "sudo systemd-run -u refchanged --path-property=PathChanged=/ostree/repo/refs/heads/ostree/1/1 systemctl stop wait.service") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, though I wonder if it's even necessary. Once the machine goes down for a reboot, it'll stop the wait
unit below, so you should already get the desired effect. I guess RunCmdSync
might mark that as an error though depending on how systemd-run
exits.
I have made the change suggested in |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
CI is failing because you need to run go fmt on the code. |
need to squash the commits down into one |
This commit introduces the `waitForUpgradeToBeStaged` function to improve the stability of kola upgrade test by reducing timeout-related failures. The new function sets up a systemd path unit to monitor updates in the `/ostree/repo/refs/heads/ostree/1/1` directory, triggering a stop on `wait.service` once changes are detected. By ensuring we wait later in the upgrade process, we minimize the waiting period in `runFnAndWaitForRebootIntoVersion`, focusing only on the actual reboot phase. Author : Dusty Mabe <[email protected]> Ref: coreos/fedora-coreos-tracker#1805
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
This commit introduces the
waitForUpgradeToBeStaged
function to improve the stability of kola upgrade test by reducing timeout-related failures.The new function sets up a systemd path unit to monitor updates in the
/ostree/repo/refs/heads/ostree/1/1
directory, triggering a stop onwait.service
once changes are detected.By ensuring we wait later in the upgrade process, we minimize the waiting period in
runFnAndWaitForRebootIntoVersion
, focusing only on the actual reboot phase.Author : Dusty Mabe [email protected]
Ref: coreos/fedora-coreos-tracker#1805