-
Notifications
You must be signed in to change notification settings - Fork 319
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add rootFS and driverInstallDir fields to ClusterPolicy #747
Changes from all commits
185d521
8ebc3a6
c46ade7
7685d9f
8e4021c
d53e3e3
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -12,3 +12,9 @@ rules: | |
- use | ||
resourceNames: | ||
- privileged | ||
- apiGroups: | ||
- apps | ||
resources: | ||
- daemonsets | ||
verbs: | ||
- list |
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -36,12 +36,16 @@ spec: | |
value: "true" | ||
- name: COMPONENT | ||
value: driver | ||
- name: OPERATOR_NAMESPACE | ||
valueFrom: | ||
fieldRef: | ||
fieldPath: metadata.namespace | ||
securityContext: | ||
privileged: true | ||
seLinuxOptions: | ||
level: "s0" | ||
volumeMounts: | ||
- name: driver-install-path | ||
- name: driver-install-dir | ||
mountPath: /run/nvidia/driver | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Question (and maybe out of scope for this PR): Does it make sense to set this to There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I am using the exact hostPath so that dev-char symlink creation works correctly. I want to make sure the target of the symlink resolves correctly on the host. So for example, we create a symlink There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I believe we'd be able to create the symlink regardless of whether the target exists or not, so as lon as In the medium term we could consider moving this out of the toolkit container anyway ... since it really should be a postcondition of the driver container. Let's leave it as is for now though. |
||
mountPropagation: HostToContainer | ||
- name: run-nvidia-validations | ||
|
@@ -67,6 +71,8 @@ spec: | |
value: "management.nvidia.com/gpu" | ||
- name: NVIDIA_VISIBLE_DEVICES | ||
value: "void" | ||
- name: TOOLKIT_PID_FILE | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. For reference. This requires NVIDIA/nvidia-container-toolkit#544 |
||
value: "/run/nvidia/toolkit/toolkit.pid" | ||
imagePullPolicy: IfNotPresent | ||
name: nvidia-container-toolkit-ctr | ||
securityContext: | ||
|
@@ -78,13 +84,17 @@ spec: | |
readOnly: true | ||
mountPath: /bin/entrypoint.sh | ||
subPath: entrypoint.sh | ||
- name: nvidia-run-path | ||
mountPath: /run/nvidia | ||
mountPropagation: Bidirectional | ||
- name: toolkit-root | ||
mountPath: /run/nvidia/toolkit | ||
- name: run-nvidia-validations | ||
mountPath: /run/nvidia/validations | ||
- name: toolkit-install-dir | ||
mountPath: /usr/local/nvidia | ||
- name: crio-hooks | ||
mountPath: /usr/share/containers/oci/hooks.d | ||
- name: driver-install-dir | ||
mountPath: /driver-root | ||
mountPropagation: HostToContainer | ||
- name: host-root | ||
mountPath: /host | ||
readOnly: true | ||
|
@@ -96,17 +106,18 @@ spec: | |
configMap: | ||
name: nvidia-container-toolkit-entrypoint | ||
defaultMode: 448 | ||
- name: nvidia-run-path | ||
- name: toolkit-root | ||
hostPath: | ||
path: /run/nvidia | ||
path: /run/nvidia/toolkit | ||
type: DirectoryOrCreate | ||
- name: run-nvidia-validations | ||
hostPath: | ||
path: /run/nvidia/validations | ||
type: DirectoryOrCreate | ||
- name: driver-install-path | ||
- name: driver-install-dir | ||
hostPath: | ||
path: /run/nvidia/driver | ||
type: DirectoryOrCreate | ||
- name: host-root | ||
hostPath: | ||
path: / | ||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -9,28 +9,15 @@ data: | |
entrypoint.sh: |- | ||
#!/bin/bash | ||
|
||
driver_root="" | ||
container_driver_root="" | ||
while true; do | ||
if [[ -f /run/nvidia/validations/host-driver-ready ]]; then | ||
driver_root=/ | ||
container_driver_root=/host | ||
break | ||
elif [[ -f /run/nvidia/validations/driver-ready ]]; then | ||
driver_root=/run/nvidia/driver | ||
container_driver_root=$driver_root | ||
break | ||
else | ||
echo "waiting for the driver validations to be ready..." | ||
sleep 5 | ||
fi | ||
until [[ -f /run/nvidia/validations/driver-ready ]] | ||
cdesiniotis marked this conversation as resolved.
Show resolved
Hide resolved
|
||
do | ||
echo "waiting for the driver validations to be ready..." | ||
sleep 5 | ||
done | ||
|
||
export NVIDIA_DRIVER_ROOT=$driver_root | ||
echo "NVIDIA_DRIVER_ROOT=$NVIDIA_DRIVER_ROOT" | ||
|
||
export CONTAINER_DRIVER_ROOT=$container_driver_root | ||
echo "CONTAINER_DRIVER_ROOT=$CONTAINER_DRIVER_ROOT" | ||
|
||
set -o allexport | ||
cat /run/nvidia/validations/driver-ready | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Does it also export the custom toolkit installation path from values in toolkit to the device plugin for nvidia-ctk tool ? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. No. But we set the |
||
. /run/nvidia/validations/driver-ready | ||
|
||
echo "Starting nvidia-device-plugin" | ||
exec nvidia-device-plugin |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was not aware of the toolkit inspecting daemonsets? Was thsi change intentional?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this because the validator runs here too? I thought we changed the logic to just look for the ready file?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, it is because the validator runs here as an init container. We still need to run the driver validation here since the operator-validator pod runs with the
nvidia
runtime class, and thus, needs the toolkit pod to run first before it can run and start performing any validations.