-
Notifications
You must be signed in to change notification settings - Fork 5
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
- Loading branch information
Showing
13 changed files
with
207 additions
and
27 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,68 @@ | ||
# **Description:** | ||
# > IMPORTANT NOTE: This is BOOTC. This is meant for bootable container applications. See: https://github.com/containers/podman-desktop-extension-bootc | ||
# | ||
# This is a "base" container that installs the nvidia drivers and the nvidia container toolkit. | ||
# This is meant to be used as a base for other containers that need GPU access. | ||
# | ||
# DISABLE SECURE BOOT! You have been warned! Disable boot is **KNOWN** to cause issues with the nvidia drivers. | ||
# ENABLE 4G DECODING in the BIOS. This is needed for certain nvidia cards to work such as the Tesla P40. | ||
# | ||
# This Fedora 40 as the base image to (hopefully) be as stable as possible. Tried with Fedora 40 but found that the kernel was moving too fast | ||
# for the nvidia drivers to keep up / work properly / update correctly. | ||
# | ||
# IMPORTANT NOTE: | ||
# On boot, this will **not** have the nvidia drivers loaded it they are compiled. This is because akmods are suppose to be built on boot, but this doesn't work with bootc. | ||
# Instead, the nvidia drivers will recompile + use akmod + modprobe on boot.. and may take a minute to load. | ||
# If you have any systemd services that require the nvidia drivers, you will need to add a `After=nvidia-drivers.service` to the service or have it LATE in the boot order (ex. multi-user.target) | ||
# to ensure that the nvidia drivers are loaded before the service starts. | ||
# | ||
# For example, if you have a podman container with --restart=always, you will need to add a `After=nvidia-drivers.service` to the podman-restart.service and podman-restart.timer. file. | ||
# This has been done for you already within the nvidia-drivers.service and nvidia-toolkit-firstboot.service files. | ||
# | ||
# Note about nvidia-toolkit-fristboot.service file: This is a one-time service on boot that will create the /etc/cdi/nvidia.yaml file. This is necessary for podman | ||
# to use gpu devices. | ||
# | ||
# | ||
# **Running:** | ||
# 1. In your OTHER Containerfile, change to `FROM git.k8s.land/cdrage/bootc-nvidia-base-centos` / this Containerfile. | ||
# 2. The nvidia drivers will recompile + use akmod + modprobe on boot. | ||
# 3. Use nvidia-smi command within the booted container image to see if it works. | ||
FROM quay.io/centos-bootc/centos-bootc:stream9 | ||
|
||
#! Set kernel version as we MUST install the kernel-devel for the kernel that is being used in the base image too.. must match what stream9 has which is unpredicatable at times. | ||
#! This is due to the base image having a non-updated kernel, between the time of the "builds" and the time that the newest kernel is out.. | ||
#! for example as of writing this the kernel is 5.14.0-526.el9.x86_64 in the base OS but if you do dnf install kernel-devel it will install 5.14.0-527.el9.x86_64, causing a conflict / mismatch, | ||
#! especially for NVIDIA drivers which are very picky about the kernel version. | ||
ARG KERNEL_VERSION='5.14.0-527.el9.x86_64' | ||
|
||
#! Install rpmfusion free and nonfree repo's for access to the nvidia drivers | ||
RUN dnf install -y https://dl.fedoraproject.org/pub/epel/epel-release-latest-9.noarch.rpm && \ | ||
dnf install --nogpgcheck -y https://mirrors.rpmfusion.org/free/el/rpmfusion-free-release-$(rpm -E %rhel).noarch.rpm && \ | ||
dnf install --nogpgcheck -y https://mirrors.rpmfusion.org/nonfree/el/rpmfusion-nonfree-release-$(rpm -E %rhel).noarch.rpm | ||
|
||
#! Install the kernel, devel and headers | ||
RUN dnf install -y kernel-$KERNEL_VERSION kernel-devel-$KERNEL_VERSION kernel-headers-$KERNEL_VERSION | ||
|
||
#! Make sure the kernel installed is part of the initramfs | ||
#! this is a "forced" upgrade of the initramfs to ensure that the kernel is part of the initramfs / we use the updated kernel | ||
#! we are also required to delete all other kernels in /usr/lib/modules that are not $KERNEL_VERSION | ||
RUN set -x; dracut -vf /usr/lib/modules/$KERNEL_VERSION/initramfs.img $KERNEL_VERSION | ||
|
||
#! Delete everything in /usr/lib/modules that is not $KERNEL_VERSION | ||
RUN find /usr/lib/modules -mindepth 1 -maxdepth 1 -type d -not -name $KERNEL_VERSION -exec rm -rf {} \; | ||
|
||
#! Install the nvidia drivers | ||
RUN dnf install -y akmod-nvidia xorg-x11-drv-nvidia-cuda | ||
|
||
#! Install NVIDIA container toolkit | ||
RUN curl -s -L https://nvidia.github.io/libnvidia-container/stable/rpm/nvidia-container-toolkit.repo | tee /etc/yum.repos.d/nvidia-container-toolkit.repo && \ | ||
dnf install -y nvidia-container-toolkit | ||
|
||
#! Blacklist the nouveau driver to ensure NVIDIA drivers function properly | ||
RUN echo "blacklist nouveau" > /etc/modprobe.d/blacklist_nouveau.conf | ||
|
||
#! Copy necessary usr files | ||
COPY usr/ /usr/ | ||
|
||
#! Enable necessary services to be started at boot | ||
RUN systemctl enable nvidia-toolkit-firstboot.service nvidia-drivers.service nvidia-persist.service |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,29 @@ | ||
**Description:** | ||
> IMPORTANT NOTE: This is BOOTC. This is meant for bootable container applications. See: https://github.com/containers/podman-desktop-extension-bootc | ||
This is a "base" container that installs the nvidia drivers and the nvidia container toolkit. | ||
This is meant to be used as a base for other containers that need GPU access. | ||
|
||
DISABLE SECURE BOOT! You have been warned! Disable boot is **KNOWN** to cause issues with the nvidia drivers. | ||
ENABLE 4G DECODING in the BIOS. This is needed for certain nvidia cards to work such as the Tesla P40. | ||
|
||
This Fedora 40 as the base image to (hopefully) be as stable as possible. Tried with Fedora 40 but found that the kernel was moving too fast | ||
for the nvidia drivers to keep up / work properly / update correctly. | ||
|
||
IMPORTANT NOTE: | ||
On boot, this will **not** have the nvidia drivers loaded it they are compiled. This is because akmods are suppose to be built on boot, but this doesn't work with bootc. | ||
Instead, the nvidia drivers will recompile + use akmod + modprobe on boot.. and may take a minute to load. | ||
If you have any systemd services that require the nvidia drivers, you will need to add a `After=nvidia-drivers.service` to the service or have it LATE in the boot order (ex. multi-user.target) | ||
to ensure that the nvidia drivers are loaded before the service starts. | ||
|
||
For example, if you have a podman container with --restart=always, you will need to add a `After=nvidia-drivers.service` to the podman-restart.service and podman-restart.timer. file. | ||
This has been done for you already within the nvidia-drivers.service and nvidia-toolkit-firstboot.service files. | ||
|
||
Note about nvidia-toolkit-fristboot.service file: This is a one-time service on boot that will create the /etc/cdi/nvidia.yaml file. This is necessary for podman | ||
to use gpu devices. | ||
|
||
|
||
**Running:** | ||
1. In your OTHER Containerfile, change to `FROM git.k8s.land/cdrage/bootc-nvidia-base-centos` / this Containerfile. | ||
2. The nvidia drivers will recompile + use akmod + modprobe on boot. | ||
3. Use nvidia-smi command within the booted container image to see if it works. |
24 changes: 24 additions & 0 deletions
24
bootc-nvidia-base-centos/usr/lib/systemd/system/nvidia-drivers.service
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,24 @@ | ||
[Unit] | ||
Description=Bootc User Overlay and NVIDIA Setup will generate the kernel module and load the nvidia driver | ||
|
||
# Done before k3s and toolkit-firstboot | ||
Before=nvidia-toolkit-firstboot.service | ||
# Must be done BEFORE the podman-restart.service or podman.service (if using API) in case we are using GPU for podman for testing nvidia-smi | ||
Before=podman-restart.service podman.service | ||
|
||
# Ensure it runs before multi-user.target which would load | ||
# services such as k3s, etc. | ||
Before=multi-user.target | ||
|
||
# VERY VERY BAD way of implementing this as we have to do usroverlay just to get the nvidia driver to work | ||
# but I do not know how to get the nvidia driver to work without usroverlay to build the kernel and load it. | ||
[Service] | ||
Type=oneshot | ||
ExecStart=-/usr/bin/bootc usroverlay | ||
ExecStart=/usr/sbin/akmods --force | ||
ExecStart=/usr/sbin/modprobe nvidia | ||
RemainAfterExit=true | ||
TimeoutStartSec=300 | ||
|
||
[Install] | ||
WantedBy=basic.target |
12 changes: 12 additions & 0 deletions
12
bootc-nvidia-base-centos/usr/lib/systemd/system/nvidia-persist.service
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,12 @@ | ||
[Unit] | ||
Description=NVIDIA Persistence Daemon | ||
Wants=syslog.target | ||
After=nvidia-drivers.service | ||
|
||
[Service] | ||
Type=forking | ||
ExecStart=/usr/bin/nvidia-persistenced --user root | ||
ExecStopPost=/bin/rm -rf /var/run/nvidia-persistenced | ||
|
||
[Install] | ||
WantedBy=multi-user.target |
18 changes: 18 additions & 0 deletions
18
bootc-nvidia-base-centos/usr/lib/systemd/system/nvidia-toolkit-firstboot.service
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,18 @@ | ||
[Unit] | ||
# For more information see https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/latest/cdi-support.html | ||
Description=Generate /etc/cdi/nvidia.yaml to be used by Podman | ||
# Ensure we do this AFTER the nvidia-drivers.service | ||
After=nvidia-drivers.service | ||
# Must be done BEFORE the podman-restart.service or podman.service (if using API) | ||
# since /etc/cdi/nvidia.yaml is used by podman to access GPU | ||
Before=podman-restart.service podman.service | ||
|
||
[Service] | ||
Type=oneshot | ||
ExecStart=-/usr/bin/mkdir -p /etc/cdi | ||
ExecStart=/bin/bash -c '/usr/bin/nvidia-ctk cdi generate > /etc/cdi/nvidia.yaml' | ||
RemainAfterExit=yes | ||
TimeoutStartSec=300 | ||
|
||
[Install] | ||
WantedBy=basic.target |
Oops, something went wrong.