Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add rootFS and driverInstallDir fields to ClusterPolicy #747

Merged
merged 6 commits into from
Jun 17, 2024

Conversation

cdesiniotis
Copy link
Contributor

No description provided.

controllers/object_controls.go Outdated Show resolved Hide resolved
deployments/gpu-operator/values.yaml Outdated Show resolved Hide resolved
validator/main.go Show resolved Hide resolved
validator/main.go Outdated Show resolved Hide resolved
api/nvidia/v1/clusterpolicy_types.go Outdated Show resolved Hide resolved
api/nvidia/v1/clusterpolicy_types.go Outdated Show resolved Hide resolved
controllers/object_controls.go Outdated Show resolved Hide resolved
controllers/object_controls.go Outdated Show resolved Hide resolved
controllers/object_controls.go Outdated Show resolved Hide resolved
validator/main.go Outdated Show resolved Hide resolved
Copy link
Member

@elezar elezar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @cdesiniotis. It's a big PR, but I took a basic pass.

A general question is whether it makes sense to factor some of the transform logic that we are adding into functions instead of implementing them in-line in the three cases where we need them.

assets/state-container-toolkit/0500_daemonset.yaml Outdated Show resolved Hide resolved
@@ -41,7 +41,7 @@ spec:
seLinuxOptions:
level: "s0"
volumeMounts:
- name: driver-install-path
- name: driver-root
mountPath: /run/nvidia/driver
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Question (and maybe out of scope for this PR): Does it make sense to set this to /driver-root in the container instead?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am using the exact hostPath so that dev-char symlink creation works correctly. I want to make sure the target of the symlink resolves correctly on the host. So for example, we create a symlink /host-dev-char/195:0 --> /run/nvidia/driver/dev/nvidia0 in the driver validator container. I guess if we transformed the target path before creating the symlink, then we could always mount the driver install directory at /driver-root or /driver-install-dir. What do you think?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe we'd be able to create the symlink regardless of whether the target exists or not, so as lon as NVIDIA_DEV_ROOT (or NVIDIA_DEV_PATH) refers to the correct path on the host the symlinks should work as expected.

In the medium term we could consider moving this out of the toolkit container anyway ... since it really should be a postcondition of the driver container.

Let's leave it as is for now though.

assets/state-container-toolkit/0500_daemonset.yaml Outdated Show resolved Hide resolved
Comment on lines 126 to 128
hostRootFlag string
driverRootFlag string
devRootFlag string
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of separate flags how about the following:

type options struct {

	type hostPaths struct {
		root string
		driverRoot string
		devRoot string
	}
}

(or even a named struct for the hostPaths).

If we don't want to update the functions signatures to pass options instead, then we can create a global variable:

var hostPaths hostPathsType{}

and refer to this when setting the command line arguments.

controllers/object_controls.go Outdated Show resolved Hide resolved
controllers/object_controls.go Outdated Show resolved Hide resolved
controllers/object_controls.go Outdated Show resolved Hide resolved
controllers/object_controls.go Outdated Show resolved Hide resolved
@cdesiniotis cdesiniotis force-pushed the custom-roots branch 5 times, most recently from dc44164 to b6fe11d Compare June 13, 2024 23:54
@cdesiniotis cdesiniotis requested review from elezar and tariq1890 June 14, 2024 00:21
Comment on lines +74 to +81
// getDevRoot returns the dev root associated with the root.
// If the root is not a dev root, this defaults to "/".
func (r root) getDevRoot() string {
if r.isDevRoot() {
return string(r)
}
return "/"
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was surprised to not see a field for devPath in HostPathsSpec, but then I recalled a conversation about not needing to specify it if we assume that it is either only ever at:

  1. <DriverInstallDir>/dev; OR
  2. /dev

It looks like this is the logic that is implementing this assumption.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Correct. We decided to omit devPath from HostPathsSpec to start with. If a use case presents itself where device nodes need to exist at some other location, we can introduce this field to the API.

Copy link
Member

@elezar elezar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @cdesiniotis. I think we're getting there.

I still have some comments / questions, but these are mostly informative.

assets/state-container-toolkit/0500_daemonset.yaml Outdated Show resolved Hide resolved
@@ -41,7 +41,7 @@ spec:
seLinuxOptions:
level: "s0"
volumeMounts:
- name: driver-install-path
- name: driver-root
mountPath: /run/nvidia/driver
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe we'd be able to create the symlink regardless of whether the target exists or not, so as lon as NVIDIA_DEV_ROOT (or NVIDIA_DEV_PATH) refers to the correct path on the host the symlinks should work as expected.

In the medium term we could consider moving this out of the toolkit container anyway ... since it really should be a postcondition of the driver container.

Let's leave it as is for now though.

assets/state-container-toolkit/0500_daemonset.yaml Outdated Show resolved Hide resolved
assets/state-container-toolkit/0500_daemonset.yaml Outdated Show resolved Hide resolved
assets/state-container-toolkit/0500_daemonset.yaml Outdated Show resolved Hide resolved
controllers/object_controls.go Outdated Show resolved Hide resolved
controllers/transforms_test.go Outdated Show resolved Hide resolved
validator/driver.go Outdated Show resolved Hide resolved
validator/main.go Outdated Show resolved Hide resolved
validator/main.go Show resolved Hide resolved
validator/main.go Outdated Show resolved Hide resolved
validator/main.go Outdated Show resolved Hide resolved
echo "CONTAINER_DRIVER_ROOT=$CONTAINER_DRIVER_ROOT"

set -o allexport
cat /run/nvidia/validations/driver-ready

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does it also export the custom toolkit installation path from values in toolkit to the device plugin for nvidia-ctk tool ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No. But we set the NVIDIA_CTK_PATH variable here for the device-plugin in the clusterpolicy controller: https://github.com/NVIDIA/gpu-operator/blob/main/controllers/object_controls.go#L1284.

…ripts

This commit updates the driver validation to always create a 'driver-ready' file,
regardless if the driver is installed on the host or not. It also populates this file
with a list of environment variables, one per line, which are required by some operands.

The startup scripts for several operands are simplified to simply source the content
of this file before executing the main program for the container.

Signed-off-by: Christopher Desiniotis <[email protected]>
validator/find.go Outdated Show resolved Hide resolved
validator/main.go Outdated Show resolved Hide resolved
validator/main.go Outdated Show resolved Hide resolved
validator/main.go Outdated Show resolved Hide resolved
validator/main.go Show resolved Hide resolved
@cdesiniotis cdesiniotis force-pushed the custom-roots branch 4 times, most recently from 292e8d6 to 11757cf Compare June 15, 2024 20:30
@cdesiniotis cdesiniotis requested review from elezar and tariq1890 June 15, 2024 20:37
Copy link
Member

@elezar elezar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm happy to move forward with this as is and then address any findings from testing as follow-ups.

@@ -12,3 +12,9 @@ rules:
- use
resourceNames:
- privileged
- apiGroups:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was not aware of the toolkit inspecting daemonsets? Was thsi change intentional?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this because the validator runs here too? I thought we changed the logic to just look for the ready file?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, it is because the validator runs here as an init container. We still need to run the driver validation here since the operator-validator pod runs with the nvidia runtime class, and thus, needs the toolkit pod to run first before it can run and start performing any validations.

@@ -67,6 +71,8 @@ spec:
value: "management.nvidia.com/gpu"
- name: NVIDIA_VISIBLE_DEVICES
value: "void"
- name: TOOLKIT_PID_FILE
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For reference. This requires NVIDIA/nvidia-container-toolkit#544

devRoot = hostRoot
devRootCtrPath = "/host"
} else {
devRoot = driverInstallDir
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just so that I'm clear, this is only triggered for managed drivers?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No. This logic applies to any driver not installed on the host directly at /. So this can be the operator managed driver or an unmanaged driver, like GKEs daemonset.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The driverInstallDir that is referenced here will correspond to what is set in ClusterPolicy as hostPaths.driverInstallDir.

cmd.Stdout = os.Stdout
cmd.Stderr = os.Stderr
}
return cmd.Run()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My understanding is that nvidia-smi expects the device nodes created at /dev and that there isn't a way to control this.

Comment on lines +802 to +808
statusFileContent := strings.Join([]string{
fmt.Sprintf("IS_HOST_DRIVER=%t", driverInfo.isHostDriver),
fmt.Sprintf("NVIDIA_DRIVER_ROOT=%s", driverInfo.driverRoot),
fmt.Sprintf("DRIVER_ROOT_CTR_PATH=%s", driverInfo.driverRootCtrPath),
fmt.Sprintf("NVIDIA_DEV_ROOT=%s", driverInfo.devRoot),
fmt.Sprintf("DEV_ROOT_CTR_PATH=%s", driverInfo.devRootCtrPath),
}, "\n") + "\n"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: is this something that we could hang off the driverInfo type?

deployments/gpu-operator/templates/clusterpolicy.yaml Outdated Show resolved Hide resolved
validator/main.go Outdated Show resolved Hide resolved
validator/main.go Outdated Show resolved Hide resolved
validator/find.go Outdated Show resolved Hide resolved
cdesiniotis and others added 5 commits June 17, 2024 10:37
…naged by GPU Operator

The commit updates our driver validator to only check the presence of .driver-ctr.ready,
a file created by our driver daemonset readiness probe, if the driver container
is managed by GPU Operator. This allows us to support non-standard environments, like
GKE, where a driver container is deployed but not managed by GPU Operator.

Signed-off-by: Christopher Desiniotis <[email protected]>
This commit updates our driver validator code to only chroot when
validating a host installed driver. When validating a driver container
install, we discover the paths to libnvidia-ml.so.1 and nvidia-smi at
the driver container root and then run 'LD_PRELOAD=/driverRoot/path/to/libnvidia-ml.so.1 nvidia-smi'.

This sets the stage for validating driver container installs where driverRoot
does not represent a full filesystem hiearchy that one can 'chroot' into.

Signed-off-by: Christopher Desiniotis <[email protected]>
RootFS represents the path to the root filesystem of the host.
This is used by components that need to interact with the host filesystem
and as such this must be a chroot-able filesystem.
Examples include the MIG Manager and Toolkit Container which may need to
stop, start, or restart systemd services.

Signed-off-by: Christopher Desiniotis <[email protected]>
Co-authored-by: Angelos Kolaitis <[email protected]>
…and containers except the driver-validator

Having a static path inside our containers will make it easier when driverRoot is a configurable field.
If driverRoot is set to a custom path, we can transform the host path for the volume while keeping
the container path unchanged.

The driver-validation initContainer is the exception to this rule. From the driver validation
initContainer, the container path must match the host path otherwise the /dev/char symlinks will not
resolve correctly on the host. The target of the symlinks must correspond to the path of the device
nodes on the host. For example, when the NVIDIA device nodes are present under `/run/nvidia/driver/dev`
on the host, running the following command from inside the container would create an invalid symlink:

  ln -s /driver-root/dev/nvidiactl /host-dev-char/195:255

while running the below command from inside the container would create a valid symlink:

  ln -s /run/nvidia/driver/dev/nvidiactl /host-dev-char/195:255

Signed-off-by: Christopher Desiniotis <[email protected]>
This allows for non-standard driver container installations, where
the driver installation path and device nodes are rooted at paths
other than '/run/nvidia/driver'.

Note, setting driverInstallDir to a custom value is currently
only supported for driver container installations not managed by
by GPU Operator. For example, in the GKE use case where a driver
daemonset is deployed prior to installing GPU Operator and the GPU
Operator managed driver is disabled.

The GPU Operator's driver container daemonset still assumes that
the full driver installation is made available at '/run/nvidia/driver'
on the host, and consequently, we always mount '/run/nvidia/driver'
into the GPU Operator managed daemonset. We may consider removing this
assumption in the future and support driver container implementations
which allow for a custom driverInstallDir to be specified.

Signed-off-by: Christopher Desiniotis <[email protected]>
@cdesiniotis cdesiniotis changed the title Add hostRoot, driverInstallDir, and devPath options to ClusterPolicy Add rootFS and driverInstallDir fields to ClusterPolicy Jun 17, 2024
@cdesiniotis cdesiniotis enabled auto-merge June 17, 2024 18:22
@cdesiniotis cdesiniotis merged commit 2c4f301 into main Jun 17, 2024
11 checks passed
@tariq1890 tariq1890 deleted the custom-roots branch June 17, 2024 19:18
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants