0000-add-support-for-conversion-fp16-to-fp32.patch limit to compute 6.1 #2

cduk · 2024-06-02T20:44:08Z

In your 0000-add-support-for-conversion-fp16-to-fp32.patch you convert fp16 to fp32. Is there a way to limit this to compute 6.1 so that P100 (compute 6.0) keesp FP16?

The reason is that FP16 is twice as fast on P100.

the-crypt-keeper · 2024-06-02T20:56:58Z

@cduk Trust me you want this conversion, it's preventing a crash with --enable-prefix-cache

See vllm #4438

Edit: I've built Triton and vLLM from this repo and can confirm that the crash above is fixed and at least on 70B GPTQ model on my 2x3060+2xP100 I don't see any difference in performance (approx 15.5 Tok/sec)

sasha0552 · 2024-06-03T03:10:16Z

Hi @cduk.

In your 0000-add-support-for-conversion-fp16-to-fp32.patch you convert fp16 to fp32. Is there a way to limit this to compute 6.1 so that P100 (compute 6.0) keesp FP16?

Is the performance degradation a confirmed behavior? Do you have a crash with the original triton (Cannot convert f16 to f16, not LLVM ERROR: Cannot select: intrinsic %llvm.nvvm.shfl.sync.bfly.i32)? There is an attempt to upcast the dot results to FP32 on P40, but I'm not sure if this upcast happens on P100 since it has good FP16 performance.

If you have the Cannot convert f16 to f16 error on the original triton, then conversion to FP32 is necessary anyway. If this error is not present, then no conversion is required and does not occur (conversion only occurs when necessary), so you can safely use triton with this patch.

(There are two patches for triton, one fixes Cannot convert f16 to f16, the other fixes LLVM ERROR: Cannot select: intrinsic %llvm.nvvm.shfl.sync.bfly.i32.)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

0000-add-support-for-conversion-fp16-to-fp32.patch limit to compute 6.1 #2

0000-add-support-for-conversion-fp16-to-fp32.patch limit to compute 6.1 #2

cduk commented Jun 2, 2024

the-crypt-keeper commented Jun 2, 2024 •

edited

Loading

sasha0552 commented Jun 3, 2024 •

edited

Loading

0000-add-support-for-conversion-fp16-to-fp32.patch limit to compute 6.1 #2

0000-add-support-for-conversion-fp16-to-fp32.patch limit to compute 6.1 #2

Comments

cduk commented Jun 2, 2024

the-crypt-keeper commented Jun 2, 2024 • edited Loading

sasha0552 commented Jun 3, 2024 • edited Loading

the-crypt-keeper commented Jun 2, 2024 •

edited

Loading

sasha0552 commented Jun 3, 2024 •

edited

Loading