-
Notifications
You must be signed in to change notification settings - Fork 662
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
libnvidia-ml.so.1 not found under /usr. #1149
Comments
I'm not familiar with whatever yaml spec this is, so it's a bit hard to reason about how the request for GPUs are mapped to your container What said, when you exec into the container -- are you able to run Did this used to work without time-slicing enabled? |
I am trying to deploy this blueprint. execing into the gpu-operator gpu-feature-discovery container I see this:
|
@klueska anything? |
I meant execing into the container where you are seeing the error that libnvidoa-ml.so isn't found. |
I understand. But the pod is in a CrashLoopBackoff status. I can't exec into it.
|
Can you change its entrypoint to just |
After sleep 9999
microk8s kubectl get pod nemo-embedding-embedding-deployment-6c7567d84c-mrnrq -o yaml
|
I am using time slicing. I replicated to 4 GPUs
Why am I getting this error now?
gpu-operator pods are running fine.
This is the .yaml I am trying to deploy.
The text was updated successfully, but these errors were encountered: