Tinyllama q4 works good too. #246
cosimoiaia
started this conversation in
General
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
Hi,
First of all, thanks for the great work on putting this app together.
I tested it on my pocophone (og. 5years old model) with a personal 7B finetuned from mistral which worked on q8 but it is obviously too slow at ~25 seconds per token.
Then I tested tinyllama q4 which actually works pretty good at around 5-8 tokens/second, very usable.
Look forward to future improvements to the app! 👍
Beta Was this translation helpful? Give feedback.
All reactions