-
Notifications
You must be signed in to change notification settings - Fork 129
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Issue]: why does it take several hours to : Linking CXX shared library librccl.so when make rccl ? #1430
Comments
Did you use More info can be found with |
I have the same issue as well. I have tried using the ./install.sh -l script, and it did reduce some of the compile time, but the linking still takes around 10 minutes. You mentioned that there's been a recent improvement in build time; could you please let me know which specific commit this was implemented in? I'd like to give it a try. Thank you for your help! |
This is what we expected out of the current version for some GPU models. More build time optimization is our ongoing effort.
There is a couple. The one I remembered is: #1371. The general idea is that we were building kernels for every combination possible, some has multiple unroll variations and that takes a lot of time to link them together. We identified some of the kernels that were unnecessary and remove them. Another side note is that, if you are building for only a few collective ops, you can select to only rebuild those ops instead of all of them. You can also rebuild only for one datatype (say 32bits allreduce). |
I also encountered this issue. It keeps getting stuck when I compile inside the container. Have you resolved it? |
Hi @etoilestar, can you share more details about your container, GPU, and installation method? Using |
Problem Description
[Issue]: why does it take several hours to : Linking CXX shared library librccl.so when make rccl ?
Operating System
ubuntu-24.04
CPU
Intel(R) Core(TM) i7-14700K
GPU
2x AMD Radeon RX GPU 7900XT
ROCm Version
ROCm 6.2.3
ROCm Component
HIPCC
Steps to Reproduce
No response
(Optional for Linux users) Output of /opt/rocm/bin/rocminfo --support
No response
Additional Information
No response
The text was updated successfully, but these errors were encountered: