Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

bodhi_update_push_testing gating decision context is never honored #5476

Open
AdamWill opened this issue Sep 1, 2023 · 6 comments
Open

bodhi_update_push_testing gating decision context is never honored #5476

AdamWill opened this issue Sep 1, 2023 · 6 comments
Labels
high-gain medium-trouble Refactor Issues that are a refactor to improve maintainability for Bodhi

Comments

@AdamWill
Copy link
Contributor

AdamWill commented Sep 1, 2023

In Fedora's greenwave config, we define a decision context called bodhi_update_push_testing. The intent of this is clear: it's meant to let you gate the push of an update to updates-testing. Several packages do, in fact, define a policy for this context in their gating.yaml files, e.g. blivet-gui.

Bodhi does not actually honor this decision context, though. We do not gate pushes to updates-testing. I'm pretty sure we have not done so since 9bf71fd in 2014 - note the addition of if self.request is UpdateRequest.stable: - and I'm guessing it didn't work before that, which is why that commit added the conditional.

In a sense this is fortunate, because if we actually enforced this gating context, no update for a package which specifies it should be gated on a Fedora CI test would ever make it to updates-testing, because of the chicken/egg problem described here: Fedora CI does not schedule tests until the update reaches updates-testing (all the CI pipelines I checked trigger on the bodhi.update.status.testing.koji-build-group.build.complete message, which is only published when the update reaches updates-testing).

There's some other weirdness related to these gating points in Bodhi's web UI, as it happens. Bodhi does know about both decision contexts on some levels. If the update is pending push to testing, Bodhi's idea of whether it "passes gating" refers to the bodhi_update_push_testing context. So if the package doesn't configure any gating for that context but does configure gating for the bodhi_update_push_stable context, the web UI will tell you the update 'passes gating', even if it would actually not pass gating for push to stable. It also will not show you Fedora CI results that are 'expected' for the update to be pushed to stable until the update is pushed to testing (this confused me quite a bit recently until I worked out what was going on). Really, if we're going to fix this properly, we need to refactor updates to not just have a single test_gating_status, but know at all times whether the update is gated for push to testing and whether it's gated for push to stable, because we kinda need to know this to provide the best information in the web UI.

I broadly see two options here:

  1. Fix this all up to work properly. This would be quite a lot of work: we'd have to fix the Fedora CI chicken/egg problem, then work on the Bodhi db models and web UI to communicate better information throughout the flow; we particularly wouldn't want to let people push updates to stable when they're pending push to testing just because they pass gating for push to testing, and we'd probably want to somehow communicate both gating statuses at least sometimes...
  2. Just throw out the bodhi_update_push_testing decision context - first from the distro-wide Fedora greenwave config, then just throw all support for it out of Bodhi and clean the code up a bit. Make it official that we only support gating push-to-stable. Clearly we don't need gating on push-to-testing, since we've lived without it for a decade and I'm apparently the first person to notice it's a lie! Fixing it up would be nice - it would give us a chance to prevent broken updates reaching users who enable the updates-testing repository - but clearly the world doesn't break without that. We may decide the work to sort this all out properly is just not worth it.

CCing @msrb for a Fedora CI take, and @hluk for any greenwave take...

@AdamWill
Copy link
Contributor Author

AdamWill commented Sep 1, 2023

If you look closely at past blivet-gui updates you can see some of the weirdness here, e.g. https://bodhi.fedoraproject.org/updates/FEDORA-2023-2f3ec99d17 . The gating status went to waiting first, then failed a couple of hours later - this would be because the required CI test had not run, because the update had not reached updates-testing yet. Note that 'failed' status would have been calculated against bodhi_update_push_testing, not bodhi_update_push_stable. Yet despite the fact that the update failed the bodhi_update_push_testing gating, the next thing that happened was that the update was...pushed to updates-testing. Five minutes later, the gating status changed to 'passed' - because as soon as the update was pushed to testing, the required CI test triggered, ran and passed.

@mattiaverga mattiaverga added Refactor Issues that are a refactor to improve maintainability for Bodhi medium-trouble high-gain labels Oct 6, 2023
@mattiaverga
Copy link
Contributor

While trying to update bodhi docs regarding update states, I realized one possible flaw.
Updates in pending moving to testing are not gated, unless they're critpath updates. For "normal" updates the gating only applies between testing to stable, with Fedora CI test run after they're pushed to testing. Correct?

So, what about updates which reach the karma threshold while they're in pending, so that they're moved directly from pending to testing?

@AdamWill
Copy link
Contributor Author

AdamWill commented Oct 6, 2023

Well, per this ticket description I don't think any updates moving to testing are gated. Not even critpath ones. The example update I linked to above - https://bodhi.fedoraproject.org/updates/FEDORA-2023-2f3ec99d17 , which went from pending to testing despite the gating status showing as "failed" - is a critpath update.

As I said, if any updates were gated on push to testing, none with push-to-testing gating configured on a CI test would ever get there, because the CI tests would never run. Am I missing something?

I'm not sure what your concern about the direct-from-pending-to-stable case is, unless you mean "how do CI tests get triggered?", which is an interesting question, actually. Let's see...

Hum. The answer appears to be "well, they don't". Compare https://bodhi.fedoraproject.org/updates/FEDORA-2023-7a4026e363 (F37) and https://bodhi.fedoraproject.org/updates/FEDORA-2023-587dc80bb1 (F38) - identical Firefox updates. The F38 one got karma fast enough that it never went to testing. The F37 one didn't, and went to testing. You can see that a CI test case - fedora-ci.koji-build.rpminspect.static-analysis - ran on the F37 update, but not the F38 update.

I don't know for sure if this could conceivably cause a gating problem. I guess the scenario is: an update has push-to-stable gated on CI tests. It is submitted and gets karma so quickly it reaches the karma stable push threshold before it's pushed to testing, so it's queued for stable. But its gating status will still be 'waiting', and the CI tests it's gated on will not trigger.

What happens next? I'm not sure. If it gets pushed to testing, we're fine (the CI tests will run and the gating status will clear). If it just sits in pending forever, there's a problem.

I don't think I've ever seen this happen in real life, but it's certainly an interesting case, and another reason to improve this stuff. Note my plan for dealing with https://pagure.io/fedora-ci/general/issue/436 is to revamp test scheduling so Fedora CI tests run as soon as an update is created, as openQA tests currently do, which should address this. When I get around to it.

@AdamWill
Copy link
Contributor Author

While working on https://pagure.io/fedora-ci/general/issue/436 , I found a test that implies there is at least one path on which some kind of push-to-testing gate is implemented: bodhi-server/tests/consumers/test_signed.py test test_consume_from_tag(self) creates an update from a tag, sets its status to pending, and fakes consuming a message indicating the package has been tagged into f30-side-tag-testing-pending(? - I think that's what sample_side_tag_message is for), with greenwave set up to return an 'all tests passed' response if asked. It then expects that the update gets moved from pending to testing in that case.

@mattiaverga
Copy link
Contributor

Can we consider this fixed by #5538 ?

@AdamWill
Copy link
Contributor Author

AdamWill commented Dec 4, 2023

No, that doesn't really change this. I was planning to look at this next, if I get time.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
high-gain medium-trouble Refactor Issues that are a refactor to improve maintainability for Bodhi
Projects
None yet
Development

No branches or pull requests

2 participants