Skip to content

Commit

Permalink
update
Browse files Browse the repository at this point in the history
  • Loading branch information
cdrage committed Nov 16, 2024
1 parent 1cd037f commit 8bb4854
Show file tree
Hide file tree
Showing 13 changed files with 207 additions and 27 deletions.
52 changes: 41 additions & 11 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -34,6 +34,7 @@ Below is a general overview (with instructions) on each Docker container I use.
- [bootc-k3s-master-amd64](#bootc-k3s-master-amd64)
- [bootc-k3s-node-amd64](#bootc-k3s-node-amd64)
- [bootc-microshift-centos](#bootc-microshift-centos)
- [bootc-nvidia-base-centos](#bootc-nvidia-base-centos)
- [bootc-nvidia-base-fedora](#bootc-nvidia-base-fedora)
- [cat](#cat)
- [gameserver](#gameserver)
Expand Down Expand Up @@ -119,7 +120,7 @@ Below is a general overview (with instructions) on each Docker container I use.
* This is good for situations like cloud providers, usb sticks, etc.

**GPU:**
* Want GPU? Change the FROM to `git.k8s.land/cdrage/bootc-nvidia-base-fedora` / see `bootc-nvidia-base-fedora` folder for more details.
* Want GPU? Change the FROM to `git.k8s.land/cdrage/bootc-nvidia-base-centos` / see `bootc-nvidia-base-centos` folder for more details.
* GPU drivers will be built + loaded on each boot.
* This README is outside of the scope of **how** to use GPU with k3s, but view the k3s advanced docs for more information: https://docs.k3s.io/advanced#nvidia-container-runtime-support read it thoroughly as you WILL need nvidia-device-plugin installed and modified to ensure it has runtimeClassName set.

Expand Down Expand Up @@ -158,7 +159,7 @@ Below is a general overview (with instructions) on each Docker container I use.
* This is good for situations like cloud providers, usb sticks, etc.

**GPU:**
* Want GPU? Change the FROM to `git.k8s.land/cdrage/bootc-nvidia-base-fedora` / see `bootc-nvidia-base-fedora` folder for more details.
* Want GPU? Change the FROM to `git.k8s.land/cdrage/bootc-nvidia-base-centos` / see `bootc-nvidia-base-centos` folder for more details.
* GPU drivers will be built + loaded on each boot.
* This README is outside of the scope of **how** to use GPU with k3s, but view the k3s advanced docs for more information: https://docs.k3s.io/advanced#nvidia-container-runtime-support read it thoroughly as you WILL need nvidia-device-plugin installed and modified to ensure it has runtimeClassName set.

Expand Down Expand Up @@ -208,6 +209,38 @@ Below is a general overview (with instructions) on each Docker container I use.
RUN echo -e ' OpenShift 4.17 release\n\
Dependencies\n\

## [bootc-nvidia-base-centos](/bootc-nvidia-base-centos/Containerfile)

**Description:**
> IMPORTANT NOTE: This is BOOTC. This is meant for bootable container applications. See: https://github.com/containers/podman-desktop-extension-bootc
This is a "base" container that installs the nvidia drivers and the nvidia container toolkit.
This is meant to be used as a base for other containers that need GPU access.

DISABLE SECURE BOOT! You have been warned! Disable boot is **KNOWN** to cause issues with the nvidia drivers.
ENABLE 4G DECODING in the BIOS. This is needed for certain nvidia cards to work such as the Tesla P40.

This Fedora 40 as the base image to (hopefully) be as stable as possible. Tried with Fedora 40 but found that the kernel was moving too fast
for the nvidia drivers to keep up / work properly / update correctly.

IMPORTANT NOTE:
On boot, this will **not** have the nvidia drivers loaded it they are compiled. This is because akmods are suppose to be built on boot, but this doesn't work with bootc.
Instead, the nvidia drivers will recompile + use akmod + modprobe on boot.. and may take a minute to load.
If you have any systemd services that require the nvidia drivers, you will need to add a `After=nvidia-drivers.service` to the service or have it LATE in the boot order (ex. multi-user.target)
to ensure that the nvidia drivers are loaded before the service starts.

For example, if you have a podman container with --restart=always, you will need to add a `After=nvidia-drivers.service` to the podman-restart.service and podman-restart.timer. file.
This has been done for you already within the nvidia-drivers.service and nvidia-toolkit-firstboot.service files.

Note about nvidia-toolkit-fristboot.service file: This is a one-time service on boot that will create the /etc/cdi/nvidia.yaml file. This is necessary for podman
to use gpu devices.


**Running:**
1. In your OTHER Containerfile, change to `FROM git.k8s.land/cdrage/bootc-nvidia-base-centos` / this Containerfile.
2. The nvidia drivers will recompile + use akmod + modprobe on boot.
3. Use nvidia-smi command within the booted container image to see if it works.

## [bootc-nvidia-base-fedora](/bootc-nvidia-base-fedora/Containerfile)

**Description:**
Expand All @@ -223,6 +256,8 @@ RUN echo -e ' OpenShift 4.17 release\n\
for the nvidia drivers to keep up / work properly / update correctly.

IMPORTANT NOTE:
ANOTHER important note!!! Older cards such as the tesla p40 MAY not work because of the drivers being "too new" I had multiple issues with the p40 and the drivers. But no problems with rtx 3060 I have...

On boot, this will **not** have the nvidia drivers loaded it they are compiled. This is because akmods are suppose to be built on boot, but this doesn't work with bootc.
Instead, the nvidia drivers will recompile + use akmod + modprobe on boot.. and may take a minute to load.
If you have any systemd services that require the nvidia drivers, you will need to add a `After=nvidia-drivers.service` to the service or have it LATE in the boot order (ex. multi-user.target)
Expand All @@ -236,7 +271,7 @@ RUN echo -e ' OpenShift 4.17 release\n\


**Running:**
1. In your OTHER Containerfile, change to `FROM git.k8s.land/cdrage/bootc-nvidia-base-fedora` / this Containerfile.
1. In your OTHER Containerfile, change to `FROM git.k8s.land/cdrage/bootc-nvidia-base-centos` / this Containerfile.
2. The nvidia drivers will recompile + use akmod + modprobe on boot.
3. Use nvidia-smi command within the booted container image to see if it works.

Expand Down Expand Up @@ -424,18 +459,13 @@ RUN echo -e ' OpenShift 4.17 release\n\

**IMPORTANT NOTE:**
**Description:**

This is a "hello world" GPU container that showcases fractals by using a "minimal POC" vulkan compute example project.
Every X seconds, the fractal will be recalculated and displayed in the browser. This is all rendered on the virtualized GPU.

Runs a stress test on the GPU using Vulkan. This is meant to be ran on a Mac Silicon machine with a GPU.

**Technical Description:**
You must use Podman Desktop with Podman 5.2.0 or above and run a
podman machine with libkrun support.

For a more technical TLDR it is:
* Creates a virtualized Vulkan GPU interface
* Virtualized GPU is passed to a vulkan-to-metal layer on the host MacOS
* Uses https://github.com/containers/libkrun for all of this to work.

Source code:
In order for this to work, a patched version of mesa / vulkan is used. The source for this is located here: https://download.copr.fedorainfracloud.org/results/slp/mesa-krunkit/fedora-39-aarch64/07045714-mesa/mesa-23.3.5-102.src.rpm
Expand All @@ -448,7 +478,7 @@ RUN echo -e ' OpenShift 4.17 release\n\
podman run -d \
-p 6080:6080 \
--device /dev/dri
vulkan-mac-silicon-gpu-fractals
vulkan-mac-silicon-gpu-stress-test
```

Then visit http://localhost:6080 in your browser.
Expand Down
2 changes: 1 addition & 1 deletion bootc-k3s-master-amd64/Containerfile
Original file line number Diff line number Diff line change
Expand Up @@ -16,7 +16,7 @@
# * This is good for situations like cloud providers, usb sticks, etc.
#
# **GPU:**
# * Want GPU? Change the FROM to `git.k8s.land/cdrage/bootc-nvidia-base-fedora` / see `bootc-nvidia-base-fedora` folder for more details.
# * Want GPU? Change the FROM to `git.k8s.land/cdrage/bootc-nvidia-base-centos` / see `bootc-nvidia-base-centos` folder for more details.
# * GPU drivers will be built + loaded on each boot.
# * This README is outside of the scope of **how** to use GPU with k3s, but view the k3s advanced docs for more information: https://docs.k3s.io/advanced#nvidia-container-runtime-support read it thoroughly as you WILL need nvidia-device-plugin installed and modified to ensure it has runtimeClassName set.
#
Expand Down
2 changes: 1 addition & 1 deletion bootc-k3s-master-amd64/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -15,7 +15,7 @@
* This is good for situations like cloud providers, usb sticks, etc.

**GPU:**
* Want GPU? Change the FROM to `git.k8s.land/cdrage/bootc-nvidia-base-fedora` / see `bootc-nvidia-base-fedora` folder for more details.
* Want GPU? Change the FROM to `git.k8s.land/cdrage/bootc-nvidia-base-centos` / see `bootc-nvidia-base-centos` folder for more details.
* GPU drivers will be built + loaded on each boot.
* This README is outside of the scope of **how** to use GPU with k3s, but view the k3s advanced docs for more information: https://docs.k3s.io/advanced#nvidia-container-runtime-support read it thoroughly as you WILL need nvidia-device-plugin installed and modified to ensure it has runtimeClassName set.

Expand Down
2 changes: 1 addition & 1 deletion bootc-k3s-node-amd64/Containerfile
Original file line number Diff line number Diff line change
Expand Up @@ -15,7 +15,7 @@
# * This is good for situations like cloud providers, usb sticks, etc.
#
# **GPU:**
# * Want GPU? Change the FROM to `git.k8s.land/cdrage/bootc-nvidia-base-fedora` / see `bootc-nvidia-base-fedora` folder for more details.
# * Want GPU? Change the FROM to `git.k8s.land/cdrage/bootc-nvidia-base-centos` / see `bootc-nvidia-base-centos` folder for more details.
# * GPU drivers will be built + loaded on each boot.
# * This README is outside of the scope of **how** to use GPU with k3s, but view the k3s advanced docs for more information: https://docs.k3s.io/advanced#nvidia-container-runtime-support read it thoroughly as you WILL need nvidia-device-plugin installed and modified to ensure it has runtimeClassName set.
#
Expand Down
2 changes: 1 addition & 1 deletion bootc-k3s-node-amd64/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -13,7 +13,7 @@
* This is good for situations like cloud providers, usb sticks, etc.

**GPU:**
* Want GPU? Change the FROM to `git.k8s.land/cdrage/bootc-nvidia-base-fedora` / see `bootc-nvidia-base-fedora` folder for more details.
* Want GPU? Change the FROM to `git.k8s.land/cdrage/bootc-nvidia-base-centos` / see `bootc-nvidia-base-centos` folder for more details.
* GPU drivers will be built + loaded on each boot.
* This README is outside of the scope of **how** to use GPU with k3s, but view the k3s advanced docs for more information: https://docs.k3s.io/advanced#nvidia-container-runtime-support read it thoroughly as you WILL need nvidia-device-plugin installed and modified to ensure it has runtimeClassName set.

Expand Down
68 changes: 68 additions & 0 deletions bootc-nvidia-base-centos/Containerfile
Original file line number Diff line number Diff line change
@@ -0,0 +1,68 @@
# **Description:**
# > IMPORTANT NOTE: This is BOOTC. This is meant for bootable container applications. See: https://github.com/containers/podman-desktop-extension-bootc
#
# This is a "base" container that installs the nvidia drivers and the nvidia container toolkit.
# This is meant to be used as a base for other containers that need GPU access.
#
# DISABLE SECURE BOOT! You have been warned! Disable boot is **KNOWN** to cause issues with the nvidia drivers.
# ENABLE 4G DECODING in the BIOS. This is needed for certain nvidia cards to work such as the Tesla P40.
#
# This Fedora 40 as the base image to (hopefully) be as stable as possible. Tried with Fedora 40 but found that the kernel was moving too fast
# for the nvidia drivers to keep up / work properly / update correctly.
#
# IMPORTANT NOTE:
# On boot, this will **not** have the nvidia drivers loaded it they are compiled. This is because akmods are suppose to be built on boot, but this doesn't work with bootc.
# Instead, the nvidia drivers will recompile + use akmod + modprobe on boot.. and may take a minute to load.
# If you have any systemd services that require the nvidia drivers, you will need to add a `After=nvidia-drivers.service` to the service or have it LATE in the boot order (ex. multi-user.target)
# to ensure that the nvidia drivers are loaded before the service starts.
#
# For example, if you have a podman container with --restart=always, you will need to add a `After=nvidia-drivers.service` to the podman-restart.service and podman-restart.timer. file.
# This has been done for you already within the nvidia-drivers.service and nvidia-toolkit-firstboot.service files.
#
# Note about nvidia-toolkit-fristboot.service file: This is a one-time service on boot that will create the /etc/cdi/nvidia.yaml file. This is necessary for podman
# to use gpu devices.
#
#
# **Running:**
# 1. In your OTHER Containerfile, change to `FROM git.k8s.land/cdrage/bootc-nvidia-base-centos` / this Containerfile.
# 2. The nvidia drivers will recompile + use akmod + modprobe on boot.
# 3. Use nvidia-smi command within the booted container image to see if it works.
FROM quay.io/centos-bootc/centos-bootc:stream9

#! Set kernel version as we MUST install the kernel-devel for the kernel that is being used in the base image too.. must match what stream9 has which is unpredicatable at times.
#! This is due to the base image having a non-updated kernel, between the time of the "builds" and the time that the newest kernel is out..
#! for example as of writing this the kernel is 5.14.0-526.el9.x86_64 in the base OS but if you do dnf install kernel-devel it will install 5.14.0-527.el9.x86_64, causing a conflict / mismatch,
#! especially for NVIDIA drivers which are very picky about the kernel version.
ARG KERNEL_VERSION='5.14.0-527.el9.x86_64'

#! Install rpmfusion free and nonfree repo's for access to the nvidia drivers
RUN dnf install -y https://dl.fedoraproject.org/pub/epel/epel-release-latest-9.noarch.rpm && \
dnf install --nogpgcheck -y https://mirrors.rpmfusion.org/free/el/rpmfusion-free-release-$(rpm -E %rhel).noarch.rpm && \
dnf install --nogpgcheck -y https://mirrors.rpmfusion.org/nonfree/el/rpmfusion-nonfree-release-$(rpm -E %rhel).noarch.rpm

#! Install the kernel, devel and headers
RUN dnf install -y kernel-$KERNEL_VERSION kernel-devel-$KERNEL_VERSION kernel-headers-$KERNEL_VERSION

#! Make sure the kernel installed is part of the initramfs
#! this is a "forced" upgrade of the initramfs to ensure that the kernel is part of the initramfs / we use the updated kernel
#! we are also required to delete all other kernels in /usr/lib/modules that are not $KERNEL_VERSION
RUN set -x; dracut -vf /usr/lib/modules/$KERNEL_VERSION/initramfs.img $KERNEL_VERSION

#! Delete everything in /usr/lib/modules that is not $KERNEL_VERSION
RUN find /usr/lib/modules -mindepth 1 -maxdepth 1 -type d -not -name $KERNEL_VERSION -exec rm -rf {} \;

#! Install the nvidia drivers
RUN dnf install -y akmod-nvidia xorg-x11-drv-nvidia-cuda

#! Install NVIDIA container toolkit
RUN curl -s -L https://nvidia.github.io/libnvidia-container/stable/rpm/nvidia-container-toolkit.repo | tee /etc/yum.repos.d/nvidia-container-toolkit.repo && \
dnf install -y nvidia-container-toolkit

#! Blacklist the nouveau driver to ensure NVIDIA drivers function properly
RUN echo "blacklist nouveau" > /etc/modprobe.d/blacklist_nouveau.conf

#! Copy necessary usr files
COPY usr/ /usr/

#! Enable necessary services to be started at boot
RUN systemctl enable nvidia-toolkit-firstboot.service nvidia-drivers.service nvidia-persist.service
29 changes: 29 additions & 0 deletions bootc-nvidia-base-centos/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,29 @@
**Description:**
> IMPORTANT NOTE: This is BOOTC. This is meant for bootable container applications. See: https://github.com/containers/podman-desktop-extension-bootc
This is a "base" container that installs the nvidia drivers and the nvidia container toolkit.
This is meant to be used as a base for other containers that need GPU access.

DISABLE SECURE BOOT! You have been warned! Disable boot is **KNOWN** to cause issues with the nvidia drivers.
ENABLE 4G DECODING in the BIOS. This is needed for certain nvidia cards to work such as the Tesla P40.

This Fedora 40 as the base image to (hopefully) be as stable as possible. Tried with Fedora 40 but found that the kernel was moving too fast
for the nvidia drivers to keep up / work properly / update correctly.

IMPORTANT NOTE:
On boot, this will **not** have the nvidia drivers loaded it they are compiled. This is because akmods are suppose to be built on boot, but this doesn't work with bootc.
Instead, the nvidia drivers will recompile + use akmod + modprobe on boot.. and may take a minute to load.
If you have any systemd services that require the nvidia drivers, you will need to add a `After=nvidia-drivers.service` to the service or have it LATE in the boot order (ex. multi-user.target)
to ensure that the nvidia drivers are loaded before the service starts.

For example, if you have a podman container with --restart=always, you will need to add a `After=nvidia-drivers.service` to the podman-restart.service and podman-restart.timer. file.
This has been done for you already within the nvidia-drivers.service and nvidia-toolkit-firstboot.service files.

Note about nvidia-toolkit-fristboot.service file: This is a one-time service on boot that will create the /etc/cdi/nvidia.yaml file. This is necessary for podman
to use gpu devices.


**Running:**
1. In your OTHER Containerfile, change to `FROM git.k8s.land/cdrage/bootc-nvidia-base-centos` / this Containerfile.
2. The nvidia drivers will recompile + use akmod + modprobe on boot.
3. Use nvidia-smi command within the booted container image to see if it works.
Original file line number Diff line number Diff line change
@@ -0,0 +1,24 @@
[Unit]
Description=Bootc User Overlay and NVIDIA Setup will generate the kernel module and load the nvidia driver

# Done before k3s and toolkit-firstboot
Before=nvidia-toolkit-firstboot.service
# Must be done BEFORE the podman-restart.service or podman.service (if using API) in case we are using GPU for podman for testing nvidia-smi
Before=podman-restart.service podman.service

# Ensure it runs before multi-user.target which would load
# services such as k3s, etc.
Before=multi-user.target

# VERY VERY BAD way of implementing this as we have to do usroverlay just to get the nvidia driver to work
# but I do not know how to get the nvidia driver to work without usroverlay to build the kernel and load it.
[Service]
Type=oneshot
ExecStart=-/usr/bin/bootc usroverlay
ExecStart=/usr/sbin/akmods --force
ExecStart=/usr/sbin/modprobe nvidia
RemainAfterExit=true
TimeoutStartSec=300

[Install]
WantedBy=basic.target
Original file line number Diff line number Diff line change
@@ -0,0 +1,12 @@
[Unit]
Description=NVIDIA Persistence Daemon
Wants=syslog.target
After=nvidia-drivers.service

[Service]
Type=forking
ExecStart=/usr/bin/nvidia-persistenced --user root
ExecStopPost=/bin/rm -rf /var/run/nvidia-persistenced

[Install]
WantedBy=multi-user.target
Original file line number Diff line number Diff line change
@@ -0,0 +1,18 @@
[Unit]
# For more information see https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/latest/cdi-support.html
Description=Generate /etc/cdi/nvidia.yaml to be used by Podman
# Ensure we do this AFTER the nvidia-drivers.service
After=nvidia-drivers.service
# Must be done BEFORE the podman-restart.service or podman.service (if using API)
# since /etc/cdi/nvidia.yaml is used by podman to access GPU
Before=podman-restart.service podman.service

[Service]
Type=oneshot
ExecStart=-/usr/bin/mkdir -p /etc/cdi
ExecStart=/bin/bash -c '/usr/bin/nvidia-ctk cdi generate > /etc/cdi/nvidia.yaml'
RemainAfterExit=yes
TimeoutStartSec=300

[Install]
WantedBy=basic.target
Loading

0 comments on commit 8bb4854

Please sign in to comment.