Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to enable support for AWQ ? #736

Open
Pradeepa99 opened this issue Nov 26, 2024 · 3 comments
Open

How to enable support for AWQ ? #736

Pradeepa99 opened this issue Nov 26, 2024 · 3 comments
Assignees
Labels
CPU CPU specific issues Feature LLM

Comments

@Pradeepa99
Copy link

Describe the issue

I am trying to enable AWQ support with IPEX repo in CPU.

IPEX 2.5.0 ⁠release states that it has the support for AWQ Quantization.

But we could see only the GPTQ support added in the official repo.

In the below script file,
https://github.com/intel/intel-extension-for-pytorch/blob/release/xpu/2.5.10/examples/cpu/llm/inference/utils/run_gptq.py stated that it is deprecated and recommended to use INC.

What is the correct approach that we need to use to enable the support for AWQ with IPEX repo?

Config used:

  • Python - 3.9
  • IPEX - 2.5.0
  • Build type: release
  • Torch - 2.5.0
  • Transformers - 4.43.2
@alexsin368 alexsin368 self-assigned this Nov 27, 2024
@alexsin368
Copy link
Contributor

@Pradeepa99 The release notes mention more support for AWQ format support and it seems it is referring to the usage of ipex.llm.optimize where you can specify the quant_method as 'gptq' or 'awq' for the low_precision_checkpoint argument.

Details here: https://intel.github.io/intel-extension-for-pytorch/cpu/2.5.0+cpu/tutorials/api_doc.html#ipex.llm.optimize

Let us know if this helps put you on the right track.

@Pradeepa99
Copy link
Author

Pradeepa99 commented Nov 28, 2024

@alexsin368

Thank you for sharing this.

I have three questions to get clarified further.

  1. I found this testcase example to load the AWQ format to ipex.llm.optimize API. - Did you mean this approach to integrate AWQ support in ipex.llm.optimize ?

  2. I found this ⁠example for GPTQ, where they use ipex.quantization.gptq to generate the checkpoint for GPTQ. - Do we have any similar API to generate the checkpoints for AWQ format as well?

  3. Currently, I am following the approach mentioned here from ITREX to generate the quantized model.
    File: https://github.com/intel/intel-extension-for-transformers/blob/main/examples/huggingface/pytorch/text-generation/quantization/run_generation_cpu_woq.py - Can we quantize the models in the above method or do we follow any specific approach to quantize the models?

@alexsin368 alexsin368 added CPU CPU specific issues Feature LLM labels Dec 4, 2024
@alexsin368
Copy link
Contributor

@Pradeepa99 yes, the testcase example you found is what I meant. IPEX does not have an example similar to the GPTQ one you found.

We recommend you to use Intel Neural Compressor if you wish to use AWQ.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CPU CPU specific issues Feature LLM
Projects
None yet
Development

No branches or pull requests

2 participants