-
Notifications
You must be signed in to change notification settings - Fork 112
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
SlurmBatchSystem "does not support any accelerators" when running on a Slurm GPU cluster #887
Comments
Sorry, I haven't implemented support for GPUs in the Toil How does your Slurm cluster do GPUs @oneillkza? It looks like some (all?) clusters use a generic resource (GRES) of It seems like Slurm also has AMD ROCm support, but that it doesn't really give you a way (beyond the "type" which can be exact model numbers like "a100") to say you want a CUDA API or a ROCm API. |
Thanks @adamnovak -- yep we use (Our cluster is a bunch of servers running NVidia CUDA-capable cards, mainly 3090s, with eight GPUs per node.) |
@thiagogenez just noting that to run Cactus on a local slurm cluster, this is also necessary (ie using the latest code for Cactus, incorporating #844 as well as waiting for DataBiosphere/toil#4308). |
Hello, I am using the slurm cluster to run cactus, but after I use the module load cactus, the problem of toil_worker, command not found appears, I don’t know if you have encountered it, how did you run it? Thank you so much for your guidance, I'm a newbie and this has been bugging me for ages |
Hi @790634750 , Can you share the details of how are you calling Cactus on your Slurm environment and the errors you get, please? Then I can provide you with better answers. Cheers |
HI,@thiagogenez I use conda install -c anaconda gcc_linux-64, the download fails. |
Hi @790634750 The easiest way journey to run Cactus is the use of containers. I strongly recommended use the Docker image provided
If you have
# if you don't have GPU available
singularity pull --name cactus.sif docker://quay.io/comparative-genomics-toolkit/cactus:v2.4.0
# if you have GPUavailable
singularity pull --name cactus-gpu.sif docker://quay.io/comparative-genomics-toolkit/cactus:v2.4.0-gpu
# if you don't have GPU available
singularity run cactus.sif cactus --help
# if you have GPUavailable
singularity run --nv cactus-gpu.sif cactus --help |
HI @thiagogenez |
I believe there is a misconfiguration of the cactus module in your cluster. The |
To install the Cactus Python module, download the Cactus binaries here: https://github.com/ComparativeGenomicsToolkit/cactus/releases and install using the linked instructions: BIN_INSTALL.md You should not need to |
@glennhickey I've been trying out the latest code in #884 to enable requesting of accelerators from Toil, but am now getting the following error:
I'm not sure if this isn't maybe an upstream error -- ie Toil just hasn't implemented support for GPU resources on Slurm yet. @adamnovak is that the case? Or is this something I need to set somewhere?
(Note that we run NextFlow on this cluster pretty regularly, and it has no trouble requesting GPUs from the scheduler, and then having individual jobs use the right ones based on
$CUDA_VISIBLE_DEVICES
)The text was updated successfully, but these errors were encountered: