Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Move quantization after LoRA surgery #828

Closed
wants to merge 4 commits into from
Closed

Move quantization after LoRA surgery #828

wants to merge 4 commits into from

Conversation

caroteu
Copy link
Contributor

@caroteu caroteu commented Jan 11, 2025

@anwai98 I noticed that the way we we doing quantization, we did not quantize the LoRA matrices (A and B matrices). Was that intended? If not this should be fixed by adding the quantization after doing the LoRA surgery.

I also added code to initialize the quantized model for inference.

@caroteu caroteu requested a review from anwai98 January 11, 2025 20:52
linear_q4bit.bias = torch.nn.Parameter(bias_data)

# Replace the original linear layer with the quantized one
setattr(parent_module, layer_name, linear_q4bit)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@anwai98
Copy link
Contributor

anwai98 commented Jan 11, 2025

Hi @caroteu,

we did not quantize the LoRA matrices (A and B matrices).

I see. I checked it out, and I realized there were some minor details I missed as well (eg. quantizing only the linear layers for A and B matrices).

With the latest updates, the training with mixed precision does not work anymore for me! (let me know if that's not the case for you)

And regarding the inference loading scripts, we should probably discuss it on Monday. I am a bit skeptic about the current additions!

@anwai98
Copy link
Contributor

anwai98 commented Jan 12, 2025

Okay. I think I figured it out. The issue is that the module layers for forward and backward pass should align with each other (i.e. in both passes). And the safest way to do this is by taking care of using the right modules in the __init__ section of the surgery modules.

And thanks to @caroteu, we have a remarkable improvement in memory efficiency now! 🎉

I'll push my commits after confirming a few epochs!

PS. We should still discuss the inference changes you made because I think we don't need it. As mentioned, let's talk about the details on Monday!

@anwai98
Copy link
Contributor

anwai98 commented Jan 13, 2025

I'll drop this, since we clarified a few confusions. We will work on finalizing QLoRA in another PR!

@anwai98 anwai98 closed this Jan 13, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants