Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

In GPU operator v23.9.2 driver Image is missing for bottlerocket1.19.2 #709

Closed
6 tasks
OS-walidslim opened this issue Apr 29, 2024 · 1 comment
Closed
6 tasks

Comments

@OS-walidslim
Copy link

OS-walidslim commented Apr 29, 2024

The template below is mostly useful for bug reports and support questions. Feel free to remove anything which doesn't apply to you and add more information where it makes sense.

Important Note: NVIDIA AI Enterprise customers can get support from NVIDIA Enterprise support. Please open a case here.

1. Quick Debug Information

  • bottlerocket1.19.2:
  • Kernel Version:
  • Container Runtime Type/Version: Containerd
  • K8s Flavor/Version: EKS
  • GPU Operator Version: 23.9.2

2. Issue or feature description

Failed to pull image "nvcr.io/nvidia/driver:535.104.05-bottlerocket1.19.2": rpc error: code = NotFound desc = failed to pull and unpack image "nvcr.io/nvidia/driver:535.104.05-bottlerocket1.19.2": failed to resolve reference "nvcr.io/nvidia/driver:535.104.05-bottlerocket1.19.2": nvcr.io/nvidia/driver:535.104.05-bottlerocket1.19.2: not found

Image doesn't exist also on the image repository

3. Steps to reproduce the issue

Deploy gpu operator charts on EKS

4. Information to attach (optional if deemed irrelevant)

  • kubernetes pods status: kubectl get pods -n OPERATOR_NAMESPACE
  • kubernetes daemonset status: kubectl get ds -n OPERATOR_NAMESPACE
  • If a pod/ds is in an error state or pending state kubectl describe pod -n OPERATOR_NAMESPACE POD_NAME
  • If a pod/ds is in an error state or pending state kubectl logs -n OPERATOR_NAMESPACE POD_NAME --all-containers
  • Output from running nvidia-smi from the driver container: kubectl exec DRIVER_POD_NAME -n OPERATOR_NAMESPACE -c nvidia-driver-ctr -- nvidia-smi
  • containerd logs journalctl -u containerd > containerd.log

Collecting full debug bundle (optional):

curl -o must-gather.sh -L https://raw.githubusercontent.com/NVIDIA/gpu-operator/master/hack/must-gather.sh 
chmod +x must-gather.sh
./must-gather.sh

NOTE: please refer to the must-gather script for debug data collected.

This bundle can be submitted to us via email: [email protected]

@cdesiniotis
Copy link
Contributor

Hi @OS-walidslim, this is expected behavior as we do not support Bottlerocket OS. Please refer to: https://docs.nvidia.com/datacenter/cloud-native/gpu-operator/latest/platform-support.html#supported-operating-systems-and-kubernetes-platforms

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants