-
Notifications
You must be signed in to change notification settings - Fork 112
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
GPU acceleration on a single machine #900
Comments
Yeah, it looks like Toil doesn't believe you're on a machine with a GPU. I think Toil is relying on Unfortunately, there's no real work-around on the cactus end that I can think of. GPU jobs need to get assigned GPUs via Toil starting in this release. |
(I should have said I'm using Cactus 2.4.0) Yes,
|
and:
|
Would I re-raise the issue on toil ? |
Yes, please do, as I'm 99% sure this is on the Toil end. And by the output of your nvidia-smi, it should be detecting your 4 gpus without issue.. I'll ping @adamnovak here too. |
Would also be curious to know what happens when you run with |
the argument of
|
That message happens when cactus can’t get a count from that toil function
either. I guess it confirms that that nvidia-smi invocation is failing on
both ends when called from python, despite looking fine on the console. If
you could open python and try running those commands it may reveal the
problem
…On Wed, Jan 11, 2023 at 3:51 PM Matthieu Muffato ***@***.***> wrote:
the argument of --gpu seems mandatory:
[2023-01-11T20:50:29+0000] [MainThread] [I] [toil.statsAndLogging] Cactus Commit: 47f9079
Traceback (most recent call last):
File "/home/cactus/cactus_env/bin/cactus", line 8, in <module>
sys.exit(main())
File "/home/cactus/cactus_env/lib/python3.8/site-packages/cactus/progressive/cactus_progressive.py", line 372, in main
config_wrapper.initGPU(options)
File "/home/cactus/cactus_env/lib/python3.8/site-packages/cactus/shared/configWrapper.py", line 274, in initGPU
lastz_gpu = get_gpu_count()
File "/home/cactus/cactus_env/lib/python3.8/site-packages/cactus/shared/configWrapper.py", line 261, in get_gpu_count
raise RuntimeError('Unable to automatically determine number of GPUs: Please set with --gpu N')
RuntimeError: Unable to automatically determine number of GPUs: Please set with --gpu N
—
Reply to this email directly, view it on GitHub
<#900 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAG373TGYLHKRNPJBGYFQMTWR4MNPANCNFSM6AAAAAATYMT46I>
.
You are receiving this because you commented.Message ID:
***@***.***>
|
I've found the problem ! Thanks for the help 🤝 |
I think you need to run Cactus in a top-level Singularity container that has GPUs available, and the nvidia userspace binaries (like nvidia-smi) to access them. Can't you mount those in somehow? |
Ah ok, running in Singularity is an important detail. Still strange because the gpu-enabled cactus image ( |
Sorry guys, I forgot the |
Hello,
This is the equivalent ticket of #887, but for the single_machine batch system
According to the release notes:
which sounds like it should work ?
I'm running this command:
The text was updated successfully, but these errors were encountered: