-
Notifications
You must be signed in to change notification settings - Fork 49
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: Add AL2023 GPU AMI #362
base: main
Are you sure you want to change the base?
Conversation
nvidia-fabric-manager \ | ||
pciutils \ | ||
xorg-x11-server-Xorg \ | ||
oci-add-hooks \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this shouldn't be needed - the NVIDIA container toolkit will add the hooks, likewise for the next two libnvidia-container*
deps, those should be handled by the container toolkit
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you can see what we do with the EKS AL2023 NVIDIA AMI here awslabs/amazon-eks-ami#1924
Note: we dual ship both the proprietary NVIDIA driver and the open GPU kernel module and load the correct one during instance provisioning based on the GPU card. Older cards will require the proprietary driver, newer will required the open GPU kernel module
fixes #319
Summary
Adds build support to generate a GPU-enabled ECS AMI based on AL2023
Implementation details
I tried my best to maintain the existing support in
scripts/enable-ecs-agent-gpu-support.sh
. At least for AL2023, I think this could be simplified a lot, but again I didn't want to cause to much churn.Testing
Built and published an image to my account, created an image and validated
nvidia-smi
worked and Docker runtimes were enabledPer NVIDIA's documentation (https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/latest/sample-workload.html#running-a-sample-workload-with-docker), this command works as expected:
I also did some basic tests:
Description for the changelog
Licensing
By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.