-
Notifications
You must be signed in to change notification settings - Fork 78
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add support for --gpus options in Rocker (shm merged separately) #304
base: main
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall both of these would be great. I think that the --shm-size pass through completely makes sense. it can be merged quickly if we add tests.
The --gpus option should be thought about a bit more as to how it will dovetail with the current --nvidia option. And hopefully can replace the current recommendation for --device /dev/dri/card0
src/rocker/extensions.py
Outdated
|
||
def get_docker_args(self, cliargs): | ||
args = '' | ||
shm_size = cliargs.get('gpus', None) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like some copy and paste here
setup.py
Outdated
@@ -52,6 +52,8 @@ | |||
'expose = rocker.extensions:Expose', | |||
'git = rocker.git_extension:Git', | |||
'group_add = rocker.extensions:GroupAdd', | |||
'shm_size = rocker.extensions:ShmSize', |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should be in alphabetical order.
Thank you for the review. I have made the requested changes, and added tests for both new features.
In addition to replace the current recommendation for --device /dev/dri/card0, the new |
@tfoote I have also modified the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for adding the tests. This looks like it's basically ready to merge. The interactions passing through the gpu arguments to --nvidia is a great way to get them to work together. I have one request for the code organization, but then this looks good to go.
src/rocker/extensions.py
Outdated
help="Set the size of the shared memory for the container (e.g., 512m, 1g).") | ||
|
||
|
||
class Gpus(RockerExtension): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
With the cross coupling can you move this to the nvidia_extensions file instead? It's not explicitly nvidia but I'd like to keep them colocated. And that file could potentially be renamed in the future with it's expanded scope. The X11 class is already not nvidia tied.
test/test_extension.py
Outdated
args = p.get_docker_args(mock_cliargs) | ||
self.assertIn('--shm-size 12g', args) | ||
|
||
class GpusExtensionTest(unittest.TestCase): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This would also go into the test_nvidia.py in parallel with moving the implementation
@tfoote Thank you. I have made the requested changes. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the cleanup.
Unfortunately I tried testing this out on my non-nvidia machine and was quite disappointed to discover that it doesn't work. On deeper reading the --gpus option only works with NVIDIA gpus, and not generic gpus or integrated ones.
https://docs.docker.com/reference/cli/docker/container/run/#gpus
The --gpus flag allows you to access NVIDIA GPU resources. First you need to install the [nvidia-container-runtime](https://nvidia.github.io/nvidia-container-runtime/).
So I think that it makes sense to just collapse this into the nvidia plugin implementation. I'd like to have the argument be --nvidia-gpus
but we can provide an alias to --gpus
to mirror the upstream command line syntax.
And to support your use case of using nvidia without the content being added into the container we could extend the policy to support NVIDIA_GLVND_VALID_VERSIONS with None as a valid option.
Potentially as these tools get more integrated this might be able to become a default policy. A shortcut option --nvidia-no-glvnd that sets that policy would be reasonable to consider, but I'm not sure it's worth the extra complexity.
|
||
@staticmethod | ||
def register_arguments(parser, defaults={}): | ||
parser.add_argument('--gpus', |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
parser.add_argument('--gpus', | |
parser.add_argument('--nvidia-gpus', '--gpus', |
And this would move up into the nvidia plugin.
I've separated out the shm extension and merged it in #306 so now it's just about cleaning up the nvidia gpu options. |
This PR introduces support for two new options,
--shm_size
and--gpus
, which are equivalent to the respective arguments in Docker. These options provide greater flexibility and usability when customizing container environments, particularly for resource-intensive and GPU-dependent workloads.