How's the speed compare with vanilla candle #99

MonolithFoundation · 2025-01-07T07:20:31Z

Candle supports both CUDA and CPU. How does its speed compare to the original?

guoqingbao · 2025-01-14T08:47:50Z

Candle supports both CUDA and CPU. How does its speed compare to the original?

Candle-vLLm uses Paged Attention (which is suitable for long sentence generation) and the performance of Candle-vLLm is quite similar to vLLM, e.g., it can give you over 110 tokens/s for LLaMa3.1 8B model on A100.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How's the speed compare with vanilla candle #99

How's the speed compare with vanilla candle #99

MonolithFoundation commented Jan 7, 2025

guoqingbao commented Jan 14, 2025

How's the speed compare with vanilla candle #99

How's the speed compare with vanilla candle #99

Comments

MonolithFoundation commented Jan 7, 2025

guoqingbao commented Jan 14, 2025