-
Notifications
You must be signed in to change notification settings - Fork 357
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Upgrade Transformers to v4.38.x #654
Conversation
@@ -149,3 +152,14 @@ def prepare_inputs_for_generation( | |||
} | |||
) | |||
return model_inputs | |||
|
|||
def _load_pretrained_model(cls, model, state_dict, loaded_keys, *args, **kwargs): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is this rewriting of the state dict required for us but not for HF?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe this is the case because when using Hugging Face, the question answering llama is loaded using the AutoModelForQuestionAnswering
class, which utilises the models in MODEL_FOR_QUESTION_ANSWERING_MAPPING_NAMES
and this contains LlamaForQuestionAnswering
. LlamaForQuestionAnswering
has "transformer" as the base model (https://github.com/huggingface/transformers/blob/2209b7afa04b3a6366350065f541e9248d6663c2/src/transformers/models/llama/modeling_llama.py#L1460)
But we load all models using the same AutoAdapterModel
class: In LlamaAdapterModel
, we set self.model = LlamaModel(config)
and because of this the state_dict
and model
keys must have "model" as the base model.
@calpt, as you pointed out, as soon as we load a Llama model that is sharded via We should also add to the docs the information that sharded models cannot be loaded as FlexModels with their heads. |
This seems be a rather complex code patch for something that is likely not really used. I'd rather just drop support for LlamaForQA altogether if it doesn't work for sharded checkpoints anyway, or do you think this is an important use case to support? |
Removed While updating the docs, I noticed that our sphinx version had some issues that others encountered, too: sphinx-doc/sphinx#11890 I also noticed that since #641 the docs don't include the add_XXX_head methods for every model, we have to fix this. I didn't find a quick fix, so I suppose we have to fix this in a separate PR. @calpt I believe we can merge this PR now after you approve it. |
In the live version of our docs, we do have those, right? E.g. https://docs.adapterhub.ml/classes/models/llama.html#adapters.LlamaAdapterModel.add_classification_head or https://docs.adapterhub.ml/classes/models/bert.html#adapters.BertAdapterModel.add_dependency_parsing_head. So is this issue introduced by the changes in this PR? |
@lenglaender I've fixed the docs build issues |
Changes:
LlamaForQuestionAnswering
. However, this model has a wrong base model name. I added a workaround that solves this problem until this is fixed in Transformers (Fix Base Model Name of LlamaForQuestionAnswering huggingface/transformers#29258)