You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
{{ message }}
This repository was archived by the owner on Aug 10, 2024. It is now read-only.
In your 0000-add-support-for-conversion-fp16-to-fp32.patch you convert fp16 to fp32. Is there a way to limit this to compute 6.1 so that P100 (compute 6.0) keesp FP16?
The reason is that FP16 is twice as fast on P100.
The text was updated successfully, but these errors were encountered:
Edit: I've built Triton and vLLM from this repo and can confirm that the crash above is fixed and at least on 70B GPTQ model on my 2x3060+2xP100 I don't see any difference in performance (approx 15.5 Tok/sec)
In your 0000-add-support-for-conversion-fp16-to-fp32.patch you convert fp16 to fp32. Is there a way to limit this to compute 6.1 so that P100 (compute 6.0) keesp FP16?
Is the performance degradation a confirmed behavior? Do you have a crash with the original triton (Cannot convert f16 to f16, not LLVM ERROR: Cannot select: intrinsic %llvm.nvvm.shfl.sync.bfly.i32)? There is an attempt to upcast the dot results to FP32 on P40, but I'm not sure if this upcast happens on P100 since it has good FP16 performance.
If you have the Cannot convert f16 to f16 error on the original triton, then conversion to FP32 is necessary anyway. If this error is not present, then no conversion is required and does not occur (conversion only occurs when necessary), so you can safely use triton with this patch.
(There are two patches for triton, one fixes Cannot convert f16 to f16, the other fixes LLVM ERROR: Cannot select: intrinsic %llvm.nvvm.shfl.sync.bfly.i32.)
Sign up for freeto subscribe to this conversation on GitHub.
Already have an account?
Sign in.
In your 0000-add-support-for-conversion-fp16-to-fp32.patch you convert fp16 to fp32. Is there a way to limit this to compute 6.1 so that P100 (compute 6.0) keesp FP16?
The reason is that FP16 is twice as fast on P100.
The text was updated successfully, but these errors were encountered: