Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Upgrade Transformers to v4.38.x #654

Merged
merged 12 commits into from
Apr 6, 2024
Merged

Conversation

lenglaender
Copy link
Member

Changes:

@lenglaender lenglaender requested a review from calpt February 23, 2024 19:09
@@ -149,3 +152,14 @@ def prepare_inputs_for_generation(
}
)
return model_inputs

def _load_pretrained_model(cls, model, state_dict, loaded_keys, *args, **kwargs):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this rewriting of the state dict required for us but not for HF?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe this is the case because when using Hugging Face, the question answering llama is loaded using the AutoModelForQuestionAnswering class, which utilises the models in MODEL_FOR_QUESTION_ANSWERING_MAPPING_NAMES and this contains LlamaForQuestionAnswering. LlamaForQuestionAnswering has "transformer" as the base model (https://github.com/huggingface/transformers/blob/2209b7afa04b3a6366350065f541e9248d6663c2/src/transformers/models/llama/modeling_llama.py#L1460)

But we load all models using the same AutoAdapterModel class: In LlamaAdapterModel, we set self.model = LlamaModel(config) and because of this the state_dict and model keys must have "model" as the base model.

@lenglaender
Copy link
Member Author

@calpt, as you pointed out, as soon as we load a Llama model that is sharded via AutoAdapterModel, it does not add the head.
But if someone saved a smaller non-sharded LlamaForQuestionAnswering (e.g. based on TinyLlama), we can then load it. So, for this use case, having the _load_pretrained_model function overwritten is useful.
I believe we can merge the sync PR now, right?

We should also add to the docs the information that sharded models cannot be loaded as FlexModels with their heads.

@calpt
Copy link
Member

calpt commented Mar 12, 2024

@calpt, as you pointed out, as soon as we load a Llama model that is sharded via AutoAdapterModel, it does not add the head. But if someone saved a smaller non-sharded LlamaForQuestionAnswering (e.g. based on TinyLlama), we can then load it. So, for this use case, having the _load_pretrained_model function overwritten is useful. I believe we can merge the sync PR now, right?

We should also add to the docs the information that sharded models cannot be loaded as FlexModels with their heads.

This seems be a rather complex code patch for something that is likely not really used. I'd rather just drop support for LlamaForQA altogether if it doesn't work for sharded checkpoints anyway, or do you think this is an important use case to support?

@lenglaender
Copy link
Member Author

Removed _load_pretrained_model for the LlamaAdapterModel and updated the docs.

While updating the docs, I noticed that our sphinx version had some issues that others encountered, too: sphinx-doc/sphinx#11890
Hence, I set the version to 5.0.2. This seems to change nothing; at least, I couldn't find any visual changes in the new build docs.

I also noticed that since #641 the docs don't include the add_XXX_head methods for every model, we have to fix this. I didn't find a quick fix, so I suppose we have to fix this in a separate PR.

@calpt I believe we can merge this PR now after you approve it.

@calpt
Copy link
Member

calpt commented Mar 20, 2024

I also noticed that since #641 the docs don't include the add_XXX_head methods for every model, we have to fix this. I didn't find a quick fix, so I suppose we have to fix this in a separate PR.

In the live version of our docs, we do have those, right? E.g. https://docs.adapterhub.ml/classes/models/llama.html#adapters.LlamaAdapterModel.add_classification_head or https://docs.adapterhub.ml/classes/models/bert.html#adapters.BertAdapterModel.add_dependency_parsing_head.

So is this issue introduced by the changes in this PR?

@calpt
Copy link
Member

calpt commented Apr 6, 2024

@lenglaender I've fixed the docs build issues

@calpt calpt changed the title Upgrade Transformers to v4.38.1 Upgrade Transformers to v4.38.x Apr 6, 2024
@calpt calpt added the sync label Apr 6, 2024
@calpt calpt merged commit a9152e7 into adapter-hub:main Apr 6, 2024
3 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants