Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Gibberish results for non-disabled "faster_mode" using "vicuna-7B-GPTQ-4bit-128g" model #127

Open
alex4321 opened this issue Jun 26, 2023 · 4 comments

Comments

@alex4321
Copy link
Contributor

alex4321 commented Jun 26, 2023

After fixing #124 I continuing debugging my issues.

So I am still using this model: https://huggingface.co/TheBloke/vicuna-7B-GPTQ-4bit-128g

But I were getting gibberish results by default. Like "What is the meaning of life" -> "As you лта :tinsarder,tatdenS-L-one-0"

But since previously I were using old version of this library and after seeing https://github.com/alex4321/alpaca_lora_4bit/blame/winglian-setup_pip/src/alpaca_lora_4bit/matmul_utils_4bit.py act_order (which were mentioned in the previous issue) was introduced in one of relatively late updates - I decided to check what other changes (regards "faster_mode") will change.

So I made the following notebook:
https://github.com/alex4321/alpaca_lora_4bit/blob/test-different-faster-modes/test.ipynb

And it seems like (in my setup) non-disabled faster_mode gives me gibberish results (with this model).

disable 0 As an AI language model, I don't have personal beliefs or opinions, but I can provide some insights
disable 1 As an AI language model, I don't have personal beliefs or opinions, but I can provide some insights
disable 2 As an AI language model, I don't have personal beliefs or opinions, but I can provide some insights
disable 3 As an AI language model, I don't have personal beliefs or opinions, but I can provide some insights
disable 4 As an AI language model, I don't have personal beliefs or opinions, but I can provide some insights
faster 0 As

igo Sen 16-92, 5-one 0, Gothe interested on tche
faster 1 As

че

ea/etoereiched
PrivateBorn house derber Case3Original themesam
faster 2 As

igo Sen 16-year-break
- 3-Names no 2-parts-off
faster 3 As

igo Sen 16-92, 5-one 0 0 se  in turn-
faster 4 As

igo Sen 16-92 (in
AlversAjutoCor condenrelsent failure
old_faster 0 As
 you

лта
AAitinkenment proteadata-vadorvers Fortle Mattletut,-
old_faster 1 As
 you



 SinnestroRel12sin,Mv vvardughesMan Man
old_faster 2 As
 you

лта


:
itinsarder,tatdenS-L-one-0
old_faster 3 As
 you

лта


:
itinsarder,thesavoald.toAd S-

old_faster 4 As
 you

лта


:
itinsarder,lung20ranwards,
-

p.s. I did not checked Linux environments such as Colab yet, will probably do it later as well as diving into difference between algorithms - such as should it give me exactly the same result or not and so.

@alex4321
Copy link
Contributor Author

Checked the difference in the the way one linear layer works:
https://github.com/alex4321/alpaca_lora_4bit/blob/test-different-faster-modes/test-matmul.ipynb

And, yeah, there are significant MAE between all the modes - disabled / faster / old_faster:

DISABLED-FASTER 1.0654296875 
DISABLED-OLD FASTER 0.86083984375 
FASTER-OLD FASTER 0.90478515625
DISABLED OUTPUT (5% - 95% quantiles) -2.06591796875 2.02783203125

DISABLED-FASTER 1.0927734375 
DISABLED-OLD FASTER 0.93994140625 
FASTER-OLD FASTER 0.86962890625
DISABLED OUTPUT (5% - 95% quantiles) -2.06787109375 2.0029296875

DISABLED-FASTER 1.20703125 
DISABLED-OLD FASTER 0.9873046875 
FASTER-OLD FASTER 0.99951171875
DISABLED OUTPUT (5% - 95% quantiles) -1.97216796875 2.03076171875

DISABLED-FASTER 1.0576171875 
DISABLED-OLD FASTER 0.85595703125 
FASTER-OLD FASTER 0.86328125
DISABLED OUTPUT (5% - 95% quantiles) -1.88232421875 1.8505859375

DISABLED-FASTER 1.115234375 
DISABLED-OLD FASTER 0.98388671875 
FASTER-OLD FASTER 0.97265625
DISABLED OUTPUT (5% - 95% quantiles) -1.98876953125 1.958251953125

DISABLED-FASTER 1.1455078125 
DISABLED-OLD FASTER 0.87109375 
FASTER-OLD FASTER 0.92919921875
DISABLED OUTPUT (5% - 95% quantiles) -2.00439453125 2.01318359375

DISABLED-FASTER 1.19140625 
DISABLED-OLD FASTER 0.98779296875 
FASTER-OLD FASTER 0.90869140625
DISABLED OUTPUT (5% - 95% quantiles) -1.967041015625 2.01416015625

DISABLED-FASTER 1.025390625 
DISABLED-OLD FASTER 0.90966796875 
FASTER-OLD FASTER 0.880859375
DISABLED OUTPUT (5% - 95% quantiles) -2.080078125 2.04296875

DISABLED-FASTER 1.0478515625 
DISABLED-OLD FASTER 0.9462890625 
FASTER-OLD FASTER 0.90869140625
DISABLED OUTPUT (5% - 95% quantiles) -2.04931640625 2.099609375

DISABLED-FASTER 1.0419921875 
DISABLED-OLD FASTER 0.94677734375 
FASTER-OLD FASTER 0.87158203125
DISABLED OUTPUT (5% - 95% quantiles) -1.9267578125 1.913330078125

So while most of the layer outputs lies within -2.0 ... 2.0 range - the MAE between different methods may be up to 1 (well, not sure it's not expected for quantization, but I doubt we should expect it for different calculation methods?)

@johnsmith0031
Copy link
Owner

johnsmith0031 commented Jun 26, 2023

Currently faster kernel does not support the model using act-order, because act-order requires random access on qzeros by g_idx.
Random access on VRAM would slow down the whole speed for computation so there would be some performance loss.

Also using non-act-order kernel on model with act-order may cause inf or nan.

I think you can compare the result from _matmul4bit_v2_recons and act_order kernel (faster disabled).

@alex4321
Copy link
Contributor Author

alex4321 commented Jun 26, 2023

Yeah. but in all these cases it's about not-act-order (as well as not-act-order model).

alpaca_lora_4bit.matmul_utils_4bit.act_order = False

Okay, will see the difference

@alex4321
Copy link
Contributor Author

Can't reproduce the issue using fresh setup and latest winglian-setup_pip branch. So at least it may be recreating the environment using the latest version of winglian-setup_pip will help to whoever facing the similar issue.

disable 0 As an AI language model, I don't have personal beliefs or opinions. However, the meaning of life is
disable 1 As an AI language model, I don't have personal beliefs or opinions, but I can provide some perspect
disable 2 As an AI language model, I don't have personal beliefs or opinions. However, the meaning of life is
disable 3 As an AI language model, I don't have personal beliefs or opinions, but

The post The Mean
disable 4 As an AI language model, I don't have personal beliefs or opinions. However, the meaning of life is
faster 0 As an AI language model, I don't have personal beliefs or opinions. However, the meaning of life is
faster 1 As an AI language model, I don't have personal beliefs or opinions. However, the meaning of life is
faster 2 As an AI language model, I don't have personal beliefs or opinions. However, the meaning of life is
faster 3 As an AI language model, I don't have personal beliefs or opinions. However, the meaning of life is
faster 4 As an AI language model, I don't have personal beliefs or opinions. However, the meaning of life is
old_faster 0 As an AI language model, I don't have personal beliefs or opinions. However, the meaning of life is
old_faster 1 As an AI language model, I don't have personal beliefs or opinions. However, the meaning of life is
old_faster 2 As an AI language model, I don't have personal beliefs or opinions. However, the meaning of life is
old_faster 3 As an AI language model, I don't have personal beliefs or opinions. However, the meaning of life is
old_faster 4 As an AI language model, I don't have personal beliefs or opinions. However, the meaning of life is

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants