You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hi team! Thanks to the awesome work that bringing rust to the game!
I found that the usage of LogitsProcessor in the quantized example is not proper which makes the chat process iterating unnecessary resources with bad performance as results, people trying out the example may think it is caused by candle (like the performance of candle sucks comparing with llama.cpp XD, I did think so before reviewing the code carefully )
Hi team! Thanks to the awesome work that bringing rust to the game!
I found that the usage of
LogitsProcessor
in the quantized example is not proper which makes thechat
process iterating unnecessary resources with bad performance as results, people trying out the example may think it is caused by candle (like the performance of candle sucks comparing with llama.cpp XD, I did think so before reviewing the code carefully )since we are using a loop here
candle/candle-examples/examples/quantized/main.rs
Line 508 in 236c35e
we don't have to inference all tokens again here
candle/candle-examples/examples/quantized/main.rs
Line 575 in 236c35e
instead, we can just move the LogitsProcessor out of the global interactive loop with extra cache of tokens including users' prompts
candle/candle-examples/examples/quantized/main.rs
Line 559 in 236c35e
This could be related to #1939 , the example is super slow from the second prompt
The text was updated successfully, but these errors were encountered: