Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Subset of modules with tied parameters being retrieved in get_module_size_with_ties #3308

Open
2 of 4 tasks
pablomlago opened this issue Dec 21, 2024 · 2 comments · May be fixed by #3327
Open
2 of 4 tasks

Subset of modules with tied parameters being retrieved in get_module_size_with_ties #3308

pablomlago opened this issue Dec 21, 2024 · 2 comments · May be fixed by #3327

Comments

@pablomlago
Copy link

System Info

- `Accelerate` version: 1.2.0.dev0
- Platform: Linux-5.4.0-186-generic-x86_64-with-glibc2.31
- Python version: 3.12.0

Information

  • The official example scripts
  • My own modified scripts

Tasks

  • One of the scripts in the examples/ folder of Accelerate or an officially supported no_trainer script in the examples folder of the transformers repo (such as run_no_trainer_glue.py)
  • My own task or dataset (give details below)

Reproduction

The following snippet minimally reproduces the behaviour that I am observing in my model:

import torch.nn as nn
from accelerate.utils.modeling import get_module_size_with_ties

# N must be greater than 10 for this anomalous behaviour to appear.
# hidden_size and modules sizes are set to arbitrary values. 
N = 15
hidden_size = 2
module_size = 1000

tied_params = [f"model.layers.{i}.weight" for i in range(1, N)]
module_sizes = dict(**{f"model.layers.{i}": 1000 for i in range(N)}, **{f"model.layers.{i}.weight": 10 for i in range(N)})
modules_to_treat = [(f"model.layers.{i}", nn.Linear(hidden_size, hidden_size)) for i in range(1,N)]

module_size_with_ties, tied_module_names, tied_modules = get_module_size_with_ties(tied_params, module_size, module_sizes, modules_to_treat)
tied_module_names

Where the output is: tied_module_names = ['model.layers.1', 'model.layers.2', 'model.layers.3', 'model.layers.4', 'model.layers.5', 'model.layers.6', 'model.layers.7', 'model.layers.8', 'model.layers.9', 'model.layers.1', 'model.layers.1', 'model.layers.1', 'model.layers.1', 'model.layers.1'] and note that the last five elements of the list are all 'model.layers.1'.

Expected behavior

When a model has children modules contained in a ModuleList, the names of these submodules have a number indicating their position in the container, e.g. "model.layers.0". However, the module to which a tied parameter belongs to is identified by checking that the module name is a prefix of that of the parameter (see https://github.com/huggingface/accelerate/blob/main/src/accelerate/utils/modeling.py#L1186)

tied_module_index = [i for i, (n, _) in enumerate(modules_to_treat) if n in tied_param][0]

However, if, for instance, n="model.layers.1" and tied_param="model.layers.11.weight", the check evaluates to True, even though the parameter does not belong to the given module. Consequently, the tied_modules list returned by get_module_size_with_ties will only contain the modules from model.layers.0 up to model.layers.9, as illustrated in the reproduction code snippet. As a result of this, not all the modules with the tied parameter in modules_to_treat are placed into the appropiate device, resulting in a crash when those are processed afterwards in infer_auto_device_map, since the list [i for i, (n, _) in enumerate(modules_to_treat) if n in tied_param] will be empty.

Is it possible that changing the check to (n + "." in tied_param) in tied_param would suffice to tackle this issue?

tied_module_index = [i for i, (n, _) in enumerate(modules_to_treat) if (n + ".") in tied_param][0]

By doing so, it would be guaranteed that the parameter belongs to the module. I tested this solution with the code in the attached snippet, and the result is:
tied_module_names = ['model.layers.1', 'model.layers.2', 'model.layers.3', 'model.layers.4', 'model.layers.5', 'model.layers.6', 'model.layers.7', 'model.layers.8', 'model.layers.9', 'model.layers.10', 'model.layers.11', 'model.layers.12', 'model.layers.13', 'model.layers.14'].

@BenjaminBossan
Copy link
Member

Thanks for reporting this issue. Indeed, what you show looks pretty much like a bug and the solution you propose looks sound. Would you be interested in creating a PR to fix this, also including a unit test based on your example?

@pablomlago
Copy link
Author

Thanks for your response @BenjaminBossan! Sure, I'll open a PR to fix it.

pablomlago added a commit to pablomlago/accelerate that referenced this issue Jan 7, 2025
Ensure that tied parameters are assigned to their parent module in
get_module_size_with_ties

Fixes: huggingface#3308
pablomlago added a commit to pablomlago/accelerate that referenced this issue Jan 7, 2025
Ensure that tied parameters are assigned to their parent module in
get_module_size_with_ties

Fixes: huggingface#3308
@pablomlago pablomlago linked a pull request Jan 7, 2025 that will close this issue
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants