-
Notifications
You must be signed in to change notification settings - Fork 61
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
LongLM isn't compatible with gemma-2-27b-it or gemma-2b-it #46
Comments
Hi! Thanks for your interest! Could you please try to print out the loaded models' architecture? The modification failure is triggered by cannot find the target module. In your case, for gemma-2, Gemma2ForCausalLM should be modified rather than GemmaForCausalLM. They have different classes in hugging transformers. We haven't implement SelfExtend for gemma-2 yet. Another possible problem is : almost every version of hugging transformers has some changes to the {Model_name}ForCausalLM class. We will check the newest implementation for Gemma in hugging transformers and release a new one if needed. |
Hi, thank you for getting back. I'm using |
Sorry for the oversight. Could you please share the output of: print(loaded_model)? This should print out the name of all modules in the loaded model. |
Sorry for the delayed response, I have printed the information that might be helpful to figure out the issue: import warnings
warnings.filterwarnings("ignore")
import torch
import json
import time
from transformers.models.llama.modeling_llama import LlamaAttention
from transformers import AutoTokenizer, AutoModelForCausalLM, AutoConfig
import SelfExtend
window_size = 1024
group_size = 32
model_id = '/tmp/gemma-2b-it/'
tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(
model_id,
torch_dtype=torch.bfloat16,
device_map="auto",
)
print(model) Output:
|
I also found that CodeLlama can not be loaded.
Error log:
|
Seems the modification failure is caused by the change of default attention module. The modification function assumes that the default attention module is "LlamaAttention"/"GemmaAttention", however, it's actually "LlamaSdpaAttention"/"GemmaSdpaAttention". You may refer: #23 (comment) |
Yes, by replacing "LlamaAttention" with "LlamaSdpaAttention", it works. Thank you very much. FYI: below is the patch I applied:
|
for CPU users, use fork from #25 it worked for me |
I found that the current version of LongLM can not load Gemma 1 or Gemma 2 model successfully. I wrote a minimum test to help reproduce the issue:
While trying to load the model, it fails with the error message below:
I found that it fails in the duplicate check in the L24 of SelfExtend.py. When it fails,
instance = False
.Below is a
conda env export
dump including package details in my Python environment:The text was updated successfully, but these errors were encountered: