Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

how to enable llama3-8b int4 awq models #90

Open
FlexLaughing opened this issue Aug 29, 2024 · 0 comments
Open

how to enable llama3-8b int4 awq models #90

FlexLaughing opened this issue Aug 29, 2024 · 0 comments

Comments

@FlexLaughing
Copy link

Hi ,
I got an auto-awq models (--wbits=4 --groupsize=128),and using command to run the ppl base on gpu card,
--model /home/ubuntu/qllm_v0.2.0_Llama3-8B-Chinese-Chat_q4 --epochs 0 --eval_ppl --wbits 4 --abits 16 --lwc --net llama-7b
met an error when parse https://github.com/OpenGVLab/OmniQuant/blob/main/quantize/int_linear.py#L26
seems QuantLinear define not support qweight for autoawq, Please have a check for the args, Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant