-
Notifications
You must be signed in to change notification settings - Fork 85
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Gibberish results for non-disabled "faster_mode" using "vicuna-7B-GPTQ-4bit-128g" model #127
Comments
Checked the difference in the the way one linear layer works: And, yeah, there are significant MAE between all the modes - disabled / faster / old_faster:
So while most of the layer outputs lies within -2.0 ... 2.0 range - the MAE between different methods may be up to 1 (well, not sure it's not expected for quantization, but I doubt we should expect it for different calculation methods?) |
Currently faster kernel does not support the model using act-order, because act-order requires random access on qzeros by g_idx. Also using non-act-order kernel on model with act-order may cause inf or nan. I think you can compare the result from _matmul4bit_v2_recons and act_order kernel (faster disabled). |
Yeah. but in all these cases it's about not-act-order (as well as not-act-order model).
Okay, will see the difference |
Can't reproduce the issue using fresh setup and latest
|
After fixing #124 I continuing debugging my issues.
So I am still using this model: https://huggingface.co/TheBloke/vicuna-7B-GPTQ-4bit-128g
But I were getting gibberish results by default. Like "What is the meaning of life" -> "As you лта :tinsarder,tatdenS-L-one-0"
But since previously I were using old version of this library and after seeing https://github.com/alex4321/alpaca_lora_4bit/blame/winglian-setup_pip/src/alpaca_lora_4bit/matmul_utils_4bit.py
act_order
(which were mentioned in the previous issue) was introduced in one of relatively late updates - I decided to check what other changes (regards "faster_mode") will change.So I made the following notebook:
https://github.com/alex4321/alpaca_lora_4bit/blob/test-different-faster-modes/test.ipynb
And it seems like (in my setup) non-disabled faster_mode gives me gibberish results (with this model).
p.s. I did not checked Linux environments such as Colab yet, will probably do it later as well as diving into difference between algorithms - such as should it give me exactly the same result or not and so.
The text was updated successfully, but these errors were encountered: