-
Notifications
You must be signed in to change notification settings - Fork 5
/
Containerfile
71 lines (61 loc) · 4.67 KB
/
Containerfile
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
# **Description:**
# > IMPORTANT NOTE: This is BOOTC. This is meant for bootable container applications. See: https://github.com/containers/podman-desktop-extension-bootc
#
# This is a "base" container that installs the nvidia drivers and the nvidia container toolkit.
# This is meant to be used as a base for other containers that need GPU access.
#
# DISABLE SECURE BOOT! You have been warned! Disable boot is **KNOWN** to cause issues with the nvidia drivers.
# ENABLE 4G DECODING in the BIOS. This is needed for certain nvidia cards to work such as the Tesla P40.
#
# CentOS vs Fedora NVIDIA base image...
# * centos base image uses an older nvidia driver which may work better for you (as of right now, 550 and cuda: 12.4
# * fedora base image uses a newer one (as of right now, 565 and cuda: 12.7)
#
# If you find one works vs the other, please let me know. Right now, the Tesla P40 works only with fedora image, but the RTX 3060 works with both... very odd!
#
# IMPORTANT NOTE:
# On boot, this will **not** have the nvidia drivers loaded it they are compiled. This is because akmods are suppose to be built on boot, but this doesn't work with bootc.
# Instead, the nvidia drivers will recompile + use akmod + modprobe on boot.. and may take a minute to load.
# If you have any systemd services that require the nvidia drivers, you will need to add a `After=nvidia-drivers.service` to the service or have it LATE in the boot order (ex. multi-user.target)
# to ensure that the nvidia drivers are loaded before the service starts.
#
# For example, if you have a podman container with --restart=always, you will need to add a `After=nvidia-drivers.service` to the podman-restart.service and podman-restart.timer. file.
# This has been done for you already within the nvidia-drivers.service and nvidia-toolkit-firstboot.service files.
#
# Note about nvidia-toolkit-fristboot.service file: This is a one-time service on boot that will create the /etc/cdi/nvidia.yaml file. This is necessary for podman
# to use gpu devices.
#
#
# **Running:**
# 1. In your OTHER Containerfile, change to `FROM git.k8s.land/cdrage/bootc-nvidia-base-centos` / this Containerfile.
# 2. The nvidia drivers will recompile + use akmod + modprobe on boot.
# 3. Use nvidia-smi command within the booted container image to see if it works.
FROM quay.io/centos-bootc/centos-bootc:stream9
#! Set kernel version as we MUST install the kernel-devel for the kernel that is being used in the base image too.. must match what stream9 has which is unpredicatable at times.
#! This is due to the base image having a non-updated kernel, between the time of the "builds" and the time that the newest kernel is out..
#! for example as of writing this the kernel is 5.14.0-526.el9.x86_64 in the base OS but if you do dnf install kernel-devel it will install 5.14.0-527.el9.x86_64, causing a conflict / mismatch,
#! especially for NVIDIA drivers which are very picky about the kernel version.
ARG KERNEL_VERSION='5.14.0-527.el9.x86_64'
#! Install rpmfusion free and nonfree repo's for access to the nvidia drivers
RUN dnf install -y https://dl.fedoraproject.org/pub/epel/epel-release-latest-9.noarch.rpm && \
dnf install --nogpgcheck -y https://mirrors.rpmfusion.org/free/el/rpmfusion-free-release-$(rpm -E %rhel).noarch.rpm && \
dnf install --nogpgcheck -y https://mirrors.rpmfusion.org/nonfree/el/rpmfusion-nonfree-release-$(rpm -E %rhel).noarch.rpm
#! Install the kernel, devel and headers
RUN dnf install -y kernel-$KERNEL_VERSION kernel-devel-$KERNEL_VERSION kernel-headers-$KERNEL_VERSION
#! Make sure the kernel installed is part of the initramfs
#! this is a "forced" upgrade of the initramfs to ensure that the kernel is part of the initramfs / we use the updated kernel
#! we are also required to delete all other kernels in /usr/lib/modules that are not $KERNEL_VERSION
RUN set -x; dracut -vf /usr/lib/modules/$KERNEL_VERSION/initramfs.img $KERNEL_VERSION
#! Delete everything in /usr/lib/modules that is not $KERNEL_VERSION
RUN find /usr/lib/modules -mindepth 1 -maxdepth 1 -type d -not -name $KERNEL_VERSION -exec rm -rf {} \;
#! Install the nvidia drivers
RUN dnf install -y akmod-nvidia xorg-x11-drv-nvidia-cuda
#! Install NVIDIA container toolkit
RUN curl -s -L https://nvidia.github.io/libnvidia-container/stable/rpm/nvidia-container-toolkit.repo | tee /etc/yum.repos.d/nvidia-container-toolkit.repo && \
dnf install -y nvidia-container-toolkit
#! Blacklist the nouveau driver to ensure NVIDIA drivers function properly
RUN echo "blacklist nouveau" > /etc/modprobe.d/blacklist_nouveau.conf
#! Copy necessary usr files
COPY usr/ /usr/
#! Enable necessary services to be started at boot
RUN systemctl enable nvidia-toolkit-firstboot.service nvidia-drivers.service nvidia-persist.service