Skip to content
This repository was archived by the owner on Aug 10, 2024. It is now read-only.

0000-add-support-for-conversion-fp16-to-fp32.patch limit to compute 6.1 #2

Open
cduk opened this issue Jun 2, 2024 · 2 comments
Open

Comments

@cduk
Copy link

cduk commented Jun 2, 2024

In your 0000-add-support-for-conversion-fp16-to-fp32.patch you convert fp16 to fp32. Is there a way to limit this to compute 6.1 so that P100 (compute 6.0) keesp FP16?

The reason is that FP16 is twice as fast on P100.

@the-crypt-keeper
Copy link

the-crypt-keeper commented Jun 2, 2024

@cduk Trust me you want this conversion, it's preventing a crash with --enable-prefix-cache

See vllm #4438

Edit: I've built Triton and vLLM from this repo and can confirm that the crash above is fixed and at least on 70B GPTQ model on my 2x3060+2xP100 I don't see any difference in performance (approx 15.5 Tok/sec)

@sasha0552
Copy link
Owner

sasha0552 commented Jun 3, 2024

Hi @cduk.

In your 0000-add-support-for-conversion-fp16-to-fp32.patch you convert fp16 to fp32. Is there a way to limit this to compute 6.1 so that P100 (compute 6.0) keesp FP16?

Is the performance degradation a confirmed behavior? Do you have a crash with the original triton (Cannot convert f16 to f16, not LLVM ERROR: Cannot select: intrinsic %llvm.nvvm.shfl.sync.bfly.i32)? There is an attempt to upcast the dot results to FP32 on P40, but I'm not sure if this upcast happens on P100 since it has good FP16 performance.

If you have the Cannot convert f16 to f16 error on the original triton, then conversion to FP32 is necessary anyway. If this error is not present, then no conversion is required and does not occur (conversion only occurs when necessary), so you can safely use triton with this patch.

(There are two patches for triton, one fixes Cannot convert f16 to f16, the other fixes LLVM ERROR: Cannot select: intrinsic %llvm.nvvm.shfl.sync.bfly.i32.)

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants