-
Notifications
You must be signed in to change notification settings - Fork 260
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
how to evaluate AWQ ? #1980
Comments
Hello, @chunniunai220ml |
Thank for your reply, i followed 2.x example link , bash script as follow: as readme.md said, Weight-only quantization based on fake quantization, why save qmodel in #L338? i think the qmodel weights dtype is not INT4 in storage. |
sure, the q_model need to export a compressed model https://github.com/intel/neural-compressor/blob/master/docs/source/quantization_weight_only.md#export-compressed-model you can refer to https://github.com/intel/intel-extension-for-transformers/tree/v1.5/examples/huggingface/pytorch/text-generation/quantization v1.5 to quantize int4 model, it has integrated this export compressed model. 3.x API is stay-tuned. |
does it works well on nvidia V100? the readme,md seems only describe intel-gpu installation besides, when run on CPU, it's stranged that the codes always killed for no reason after processing several blocks |
I suggest you try using 3.x api, q_model is the export compressed model. We will soon update the example of 3. x, which supports detection of auto-device. |
i git kaihui/woq_3x_eg branch , and run : but another bug in eval: and, how to load saved_results/quantmodel.pt to evaluate? |
Hi, @chunniunai220ml, try with the old version like 2.6 may solve this issue: |
https://github.com/intel/neural-compressor/blob/master/docs/source/quantization_weight_only.md#examples
how to set eval_func?
https://github.com/intel/neural-compressor/blob/master/examples/3.x_api/pytorch/nlp/huggingface_models/language-modeling/quantization/weight_only/run_clm_no_trainer.py
it seems no AWQ quantization, just RTN , GPTQ . and as readme.md said, weight-only id fake quantization, why save qmodel (user_model.save(args.output_dir) )?
The text was updated successfully, but these errors were encountered: