-
Notifications
You must be signed in to change notification settings - Fork 208
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Feature Request] Support more popular compression algorithms and highly optimized kernels on the CPU #264
Comments
Hi @yiliu30, |
Hi @StochasticRomanAgeev , thanks for your reply. There are several improvements:
|
Thanks for pr!
|
I am talking about this if branch, you need just to integrate your code there. |
Thanks and agree with your suggestion, we will work on a new PR for it soon :) |
Hi @StochasticRomanAgeev, #268 is the initial implementation following your suggestion, please take your time to review it. |
xTuring, is known for its efficient and straightforward fine-tuning support for popular LLMs, but it missing features related to popular compression algorithms, particularly weight-only quantization. These compression methods have been widely acknowledged for their efficiency and are commonly adopted in the industry. Furthermore, xTuring has limited support for CPU-side optimization.
Our team developed these quantization algorithms on Intel® Neural Compressor and Intel-Extension-for-Transformers. We want to integrate these algorithms into the xTuring. This integration aims to:
Usage
Supported Scope
All currently supported models.
Plan
@StochasticRomanAgeev @tushar2407
The text was updated successfully, but these errors were encountered: