-
Notifications
You must be signed in to change notification settings - Fork 2.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
podman machine alignment with bootc install #19899
Comments
@baude @vrothberg @mheon @umohnani8 @n1hility Thoughts? |
@cgwalters interesting. are you guys thinking of dropping platform/hypervisor image builds in fcos long term, in favor of this containerized fetch model? |
Well...perhaps more that this would be a new thing that isn't FCOS, actually. |
We want to make it easy for users to be able to use different kinds of podman machines. RHIVOS for one, potentially RCOS, and many home grown ones. Currently it is difficult to build these and we want to make it simpler. |
@baude also re having external OS updates https://groups.google.com/g/kubevirt-dev/c/K-jNJL_Y9bA/m/ZTH78OqFBAAJ |
We had a realtime chat about a bunch of issues related to this. A few notes:
Long term, I think it'd be better to switch to a more minimal (Sagano) derived image and the ISO per above. |
We need to start a requirements doc for what Podman Machine needs. This means that as soon as podman 4.7 is released, podman-machine 4.7 is available. Similar podman machine/podman desktop need a mechanism to warn users if they are using an out of date podman-machine. Currently users are only updating ad-hoc, when they destroy and re-add the machine. This means users could run the same podman-machine for years without updating. |
@containers/podman-desktop-maintainers FYI |
Right sorry, the other thing I meant to say is that right now, podman could/should switch to shipping (We could continue to apply-live those changes...but there will very likely be more skew between the disk image and container image, and doing an apply-live flow on that may not always be reliable) |
What's the infrastructure to build containers for the github.com/containers org today? Is it https://github.com/containers/automation_images ? (Hmm...one thing definitely related to this of course is that that tooling could itself be switched over to build derived bootable images too for the server side builders...) |
@cevich PTAL |
What's in containers/automation_images today is 90% geared toward building CI VM images. I'm toying with some container building stuff there, but honestly it's not really a great place for it. We should have a dedicated repo w/ clean/fresh/un-muddied/simple Anyway, deciding where to build should be driven largely by how the builds need to be triggered:
Bonus chapter: WRT testing the image in PRs (does it build) and do podman-machine tests break with it. That probably has to happen in our existing Cirrus-CI setup. Though this is a somewhat easier problem to solve, since (hopefully/probably) the test-built images don't need to be pushed anywhere. Point is, how the images are built may be important if it needs to run in multiple contexts. i.e. we cannot easily re-use github-actions under Cirrus-CI. So bash-driven build scripts would be preferred. |
But it is the thing that builds https://quay.io/repository/buildah/stable - no? |
Maybe by the end of the week? Hopefully? I'm not thrilled with doing it there since it complicates and already complex So esp. if there are other image builds needed, I'd prefer to have a fresh and clean repo. Maybe with a nice/helpful README and some PR-triggered test-builds. The only reason I can think of against this (as mentioned in my book above) is if there's a need to trigger exactly 1:1 based on merges to podman main. |
I can't canonically speak for podman but I am pretty sure there's no such requirement; the main goal would be tags for stable releases and a rolling main+latest pair.
OK, right. I had that impression. On the coreos side we also have these Jenkins jobs of which there are two that build container images (but for multi-arch!) e.g. this one which is so much code to just build a basic container image. AFAICS the containers/automation_images uses full emulation, but the CoreOS team has set up dedicated multi-arch workers (for a few reasons; but basically the arch is a lot more relevant for the base OS). As we (ideally) look to align the containers/ and coreos/ github organizations and teams a bit more, it probably makes sense to figure out how to share infrastructure and tooling here. |
IKR! It's kind of absurd how much is needed.
Yes, emulated builds are for-sure less than ideal for many reasons.
Urvashi is finishing up a Anyway, are your Jenkins jobs a good place for this image build, or were you also looking to offload that somewhere? Unless your builders are accessible from the internet, I'd be limited to x86 and arm builders. I'm slightly okay with setting up/maintaining a build-farm, but would prefer not to. |
Given the toplevel goal here is that container images are generated that are lifecycled to podman, under its own control and tooling, it'd be rather ironic if we said it should be built in the CoreOS jenkins 😄 ISTM what we more really want is for the non-boot containers (e.g. coreos-assembler today) to be generated by a container build system that's shared with other teams. Then, the podman-boot image that derives from FCOS can be built in that place too. |
+1 IMO new builds are much easier to advance rapidly on when you can isolate them with their own setup. You can always merge them back wherever if needed.
I think there is a 1:1 need, but it can be distilled into ideally when testing runs you want the machine used for testing to include the CI main built version of podman. This doesnt necessarily equate though to requiring a full image, it could be appropriate to just apply a freshly built podman package on top of the image, or some other manual layer override for the purpose of testing (at the end of the day you are really just testing a podman linux binary + a podman host binary). It sounds like a full stream update comes into play more on a less frequent basis (perhaps daily/hourly) |
BTW one slight tangent I should bring up is our WSL backend is package-based Fedora and not FCOS derived. The primary reason, at the time, is that WSL distributions have a lifecycle that is independent of kernel boot, and are not even in control of their own init. For all intents and purposes, you can view a WSL distribution as a privileged container (and in fact it's implemented using linux namespaces) The entry point is either a manual on-demand script (where we control the lifecycle via machine start, bootstrap systemd etc), or, in very recent versions, systemd units (they now finally have built-in systemd support - only really usable in preview builds but it is there now). At the time I did hack up an experimental in-place ostree bootstrap, but doing it properly would have meant requiring a formal WSL bootstrap and distribution to land in fcos, so in the interest of time we opted for just straight usage of package-based Fedora initially. It sounds like this bootc related work may be close to what would be required under WSL. The major difference being no initial OS, you just need a first boot container like image that does the image fetch install to the new ostree. wdyt @cgwalters ? (Edit: to be clear, the kernel replacement aspects would also be skipped as part of install since WSL is in control of the kernel) |
(I'd s/fedora/package-based fedora/ there - fcos is Fedora too) |
good point, I always found that awkward to say, changed it! |
Bigger picture, the bootc/ostree style flow is most valuable when one wants transactional in-place OS updates - i.e. when the OS upgrades itself. In the podman machine case, that's not actually on by default even! And it sounds like the WSL case is much like that. So yes, I don't see a big deal in using a package-based flow there and ignoring ostree/bootc. The neat thing of course is at least now that FCOS is a container already you can just...run it that way and ignore the ostree stuff that doesn't run when it's executed as a container. IOW, still getting the benefits of the larger CI/integration story around it at least. |
So if I'm following correctly. We'd have a periodic job building FCOS images from 'main'. For PRs (potentially doing breaking podman-machine changes), we'd do basically the same as we do today for other CI testing. Take the "latest" main-build of FCOS image, then somehow/simply graft the freshly built podman-remote binary (named 'podman') into it. Adjusting tests to use the "correct" binaries/image as needed. Have I got that right? Also, I've been assuming but should confirm: Is this bootc FCOS image we're talking about different from the one Lokesh setup to build in a GHA workflow (on main-push)? IIRC that one is for podman-desktop testing, but maybe that's the same use-case here? |
I was thinking we'd install an updated RPM in the container to avoid confusion (stale rpmdb).
Nope it's that exact use case. So I think this about either extending that Github Action, or migrating it to the same code that's pushing other podman images. |
We have some packit test-builds of RPMs but AFAIK, synchronizing that activity with Cirrus-CI could be difficult. I s'pose it's more complex than simply copying a binary, there's also some default configs and other dependencies could change in a PR.
Ahh okay, so that was done in GHA specifically because it needs to be 1:1 synchronized with what's happening on Maybe @lsm5 has an idea here: Is there a I guess it could be something simple, like repeatedly try to curl it until it works or times out 🤔 |
Also note: It's entirely possible something could get merged, and a new main-based FCOS image isn't built in time for an otherwise breaking CI run in a PR. This repo. DOES NOT require PR-rebasing before merge. So there's definitely a (maybe small?) risk a breaking change merging w/o being noticed: "/lgtm CI appears green". |
@cevich wait-for-copr has
@cevich remind me please, will you be switching our non-fcos podman image builds from Cirrus to GHA? If that's the case I think this part gets taken care of. /cc @cgwalters |
don't know why my comments are getting posted twice. I only hit the button once. Second time it's happened today. Sorry about the repeat pings if any. |
we don't do merge queues yet, do we? Maybe we should? |
Darn, that doesn't sound like something we should risk subjecting PR authors to.
I was thinking about it, and looked into it briefly. Then I realized it would be faster to get the old script working. Also as you've seen, GHA is simply a PITA to work with, esp. on branch-only jobs w/ lots of secrets.
IIRC the openshift-ci bot does this to a limited (short queue depth) extent. So I think testing-wise, the proper thing is probably to move the per-PR FCOS build under Cirrus-CI, so that we can easily feed that image into podman-machine tests with matching bits. I think that's the simplest PR-level solution that will also avoid most surprises at the branch-level. Lokesh, would it be easy-ish to add a Cirrus-CI build task to produce an RPM (x86_64 only for now, possibly other arches later) using the same spec/scripts consumed by packit? |
|
But the build NVR currently is not the same format as what we see from packit. Packit builds use changes from |
Ref: containers#19899 (comment) [NO NEW TESTS NEEDED] Signed-off-by: Lokesh Mandvekar <[email protected]> update Signed-off-by: Lokesh Mandvekar <[email protected]>
Thanks Lokesh. I don't think the NVR should make any difference at all, it's just for CI-use and should never leave that context. |
To be clear, I think the medium/long term for podman machine should look like:
|
There is an entirely different flow where we try to decouple podman and the base OS by default; treating podman as a floating "extension" that can be applied dynamically. It basically works this way today with e.g. Fedora Cloud precisely because podman isn't installed by default. We'd split into 3 things:
However as we know there's a lot of hooks/dependencies podman has into the base OS (e.g. selinux, kernel bug/feature exposure) and in practice I think there'd need to be something in podman-machine which manages tested pairs. |
i did some light reading. it is not obvious to me that osbuild would support vhdx and other image needs? if not, any idea on possible interest here? |
The plan is to make osbuild effectively be "disk image builder" - if it doesn't support something today we'll make it do so. We will drain all disk image building logic out of coreos-assembler to use this. |
A friendly reminder that this issue had no activity for 30 days. |
Feature request description
Basically switch podman machine to:
podman run --privileged <container> bootc install /dev/vda
(after a locally invoked qemu-img create e.g.)The text was updated successfully, but these errors were encountered: