-
Notifications
You must be signed in to change notification settings - Fork 10.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
CUDA: use arch list for compatibility check #11775
CUDA: use arch list for compatibility check #11775
Conversation
Wouldn't be possible to check if the architecture is in |
c80a441
to
23ef203
Compare
23ef203
to
7ae0912
Compare
I pushed a higher-effort fix. I think the correct way to do it is to change the functions like I'm using Manjaro with CUDA 12.6 on my systems. For whatever reason the CUDA cross compile is broken (fails when trying to run the code) and I so far did not bother to debug why because I don't need it. So the code I pushed is not properly tested. Either someone else needs to assert that it works correctly by compiling only for compute capability 5.2 or reviewing will need to wait until I've gotten around to fixing my setup. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure if that's what you need, but I can compile with -DCMAKE_CUDA_ARCHITECTURES=52
and it runs.
If an FP16 model or |
|
Noted, but this is 100% already an issue on master (and so far no one noticed). Does |
7ae0912
to
165edb3
Compare
Actually, if you edit |
The F16 test cases and models work. With the latest commit it fails in a different case:
|
165edb3
to
d55e584
Compare
I went through the uses of |
Just as I had pressed enter on the previous post I realized that the logic for Also I reverted an incorrect change to the logic regarding whether FlashAttention kernels are supported. |
Co-authored-by: Diego Devesa <[email protected]>
Fixes #10318 (comment) .
The problem is that by default the code is being compiled for compute capabilities 5.2, 6.1, 7.0, and 7.5. A GP100 has compute capability 6.0, the minimum for FP16 intrinsics. The host code says that it can do MMV with those intrinsics but without
GGML_CUDA_F16
there is no actual device code available. This PR is more of a band-aid fix that just makes GPUs with compute capability use FP32 arithmetic if the code was not compiled withGGML_CUDA_F16
. Medium-term I intend to revise the handling of these intrinsics and I'll do a proper fix at that time.