How to enable support for AWQ ? #736

Pradeepa99 · 2024-11-26T05:43:36Z

Describe the issue

I am trying to enable AWQ support with IPEX repo in CPU.

IPEX 2.5.0 ⁠release states that it has the support for AWQ Quantization.

But we could see only the GPTQ support added in the official repo.

In the below script file,
https://github.com/intel/intel-extension-for-pytorch/blob/release/xpu/2.5.10/examples/cpu/llm/inference/utils/run_gptq.py stated that it is deprecated and recommended to use INC.

What is the correct approach that we need to use to enable the support for AWQ with IPEX repo?

Config used:

Python - 3.9
IPEX - 2.5.0
Build type: release
Torch - 2.5.0
Transformers - 4.43.2

alexsin368 · 2024-11-28T00:06:08Z

@Pradeepa99 The release notes mention more support for AWQ format support and it seems it is referring to the usage of ipex.llm.optimize where you can specify the quant_method as 'gptq' or 'awq' for the low_precision_checkpoint argument.

Details here: https://intel.github.io/intel-extension-for-pytorch/cpu/2.5.0+cpu/tutorials/api_doc.html#ipex.llm.optimize

Let us know if this helps put you on the right track.

Pradeepa99 · 2024-11-28T15:45:05Z

@alexsin368

Thank you for sharing this.

I have three questions to get clarified further.

I found this testcase example to load the AWQ format to ipex.llm.optimize API. - Did you mean this approach to integrate AWQ support in ipex.llm.optimize ?
I found this ⁠example for GPTQ, where they use ipex.quantization.gptq to generate the checkpoint for GPTQ. - Do we have any similar API to generate the checkpoints for AWQ format as well?
Currently, I am following the approach mentioned here from ITREX to generate the quantized model.
File: https://github.com/intel/intel-extension-for-transformers/blob/main/examples/huggingface/pytorch/text-generation/quantization/run_generation_cpu_woq.py - Can we quantize the models in the above method or do we follow any specific approach to quantize the models?

alexsin368 · 2024-12-05T02:03:02Z

@Pradeepa99 yes, the testcase example you found is what I meant. IPEX does not have an example similar to the GPTQ one you found.

We recommend you to use Intel Neural Compressor if you wish to use AWQ.

alexsin368 self-assigned this Nov 27, 2024

alexsin368 added CPU CPU specific issues Feature LLM labels Dec 4, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How to enable support for AWQ ? #736

How to enable support for AWQ ? #736

Pradeepa99 commented Nov 26, 2024

alexsin368 commented Nov 28, 2024

Pradeepa99 commented Nov 28, 2024 •

edited

Loading

alexsin368 commented Dec 5, 2024

How to enable support for AWQ ? #736

How to enable support for AWQ ? #736

Comments

Pradeepa99 commented Nov 26, 2024

Describe the issue

alexsin368 commented Nov 28, 2024

Pradeepa99 commented Nov 28, 2024 • edited Loading

alexsin368 commented Dec 5, 2024

Pradeepa99 commented Nov 28, 2024 •

edited

Loading