Subset of modules with tied parameters being retrieved in get_module_size_with_ties #3308

pablomlago · 2024-12-21T16:03:03Z

System Info

- `Accelerate` version: 1.2.0.dev0
- Platform: Linux-5.4.0-186-generic-x86_64-with-glibc2.31
- Python version: 3.12.0

Information

The official example scripts
My own modified scripts

Tasks

One of the scripts in the examples/ folder of Accelerate or an officially supported no_trainer script in the examples folder of the transformers repo (such as run_no_trainer_glue.py)
My own task or dataset (give details below)

Reproduction

The following snippet minimally reproduces the behaviour that I am observing in my model:

import torch.nn as nn
from accelerate.utils.modeling import get_module_size_with_ties

# N must be greater than 10 for this anomalous behaviour to appear.
# hidden_size and modules sizes are set to arbitrary values. 
N = 15
hidden_size = 2
module_size = 1000

tied_params = [f"model.layers.{i}.weight" for i in range(1, N)]
module_sizes = dict(**{f"model.layers.{i}": 1000 for i in range(N)}, **{f"model.layers.{i}.weight": 10 for i in range(N)})
modules_to_treat = [(f"model.layers.{i}", nn.Linear(hidden_size, hidden_size)) for i in range(1,N)]

module_size_with_ties, tied_module_names, tied_modules = get_module_size_with_ties(tied_params, module_size, module_sizes, modules_to_treat)
tied_module_names

Where the output is: tied_module_names = ['model.layers.1', 'model.layers.2', 'model.layers.3', 'model.layers.4', 'model.layers.5', 'model.layers.6', 'model.layers.7', 'model.layers.8', 'model.layers.9', 'model.layers.1', 'model.layers.1', 'model.layers.1', 'model.layers.1', 'model.layers.1'] and note that the last five elements of the list are all 'model.layers.1'.

Expected behavior

When a model has children modules contained in a ModuleList, the names of these submodules have a number indicating their position in the container, e.g. "model.layers.0". However, the module to which a tied parameter belongs to is identified by checking that the module name is a prefix of that of the parameter (see https://github.com/huggingface/accelerate/blob/main/src/accelerate/utils/modeling.py#L1186)

tied_module_index = [i for i, (n, _) in enumerate(modules_to_treat) if n in tied_param][0]

However, if, for instance, n="model.layers.1" and tied_param="model.layers.11.weight", the check evaluates to True, even though the parameter does not belong to the given module. Consequently, the tied_modules list returned by get_module_size_with_ties will only contain the modules from model.layers.0 up to model.layers.9, as illustrated in the reproduction code snippet. As a result of this, not all the modules with the tied parameter in modules_to_treat are placed into the appropiate device, resulting in a crash when those are processed afterwards in infer_auto_device_map, since the list [i for i, (n, _) in enumerate(modules_to_treat) if n in tied_param] will be empty.

Is it possible that changing the check to (n + "." in tied_param) in tied_param would suffice to tackle this issue?

tied_module_index = [i for i, (n, _) in enumerate(modules_to_treat) if (n + ".") in tied_param][0]

By doing so, it would be guaranteed that the parameter belongs to the module. I tested this solution with the code in the attached snippet, and the result is:
tied_module_names = ['model.layers.1', 'model.layers.2', 'model.layers.3', 'model.layers.4', 'model.layers.5', 'model.layers.6', 'model.layers.7', 'model.layers.8', 'model.layers.9', 'model.layers.10', 'model.layers.11', 'model.layers.12', 'model.layers.13', 'model.layers.14'].

The text was updated successfully, but these errors were encountered:

BenjaminBossan · 2025-01-06T13:18:04Z

Thanks for reporting this issue. Indeed, what you show looks pretty much like a bug and the solution you propose looks sound. Would you be interested in creating a PR to fix this, also including a unit test based on your example?

pablomlago · 2025-01-07T10:32:53Z

Thanks for your response @BenjaminBossan! Sure, I'll open a PR to fix it.

Ensure that tied parameters are assigned to their parent module in get_module_size_with_ties Fixes: huggingface#3308

pablomlago added a commit to pablomlago/accelerate that referenced this issue Jan 7, 2025

Ensure that tied parameter is children of module

2666077

Ensure that tied parameters are assigned to their parent module in get_module_size_with_ties Fixes: huggingface#3308

pablomlago added a commit to pablomlago/accelerate that referenced this issue Jan 7, 2025

Ensure that tied parameter is children of module

348c158

Ensure that tied parameters are assigned to their parent module in get_module_size_with_ties Fixes: huggingface#3308

pablomlago linked a pull request Jan 7, 2025 that will close this issue

Ensure that tied parameter is children of module #3327

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Subset of modules with tied parameters being retrieved in get_module_size_with_ties #3308

Subset of modules with tied parameters being retrieved in get_module_size_with_ties #3308

pablomlago commented Dec 21, 2024

BenjaminBossan commented Jan 6, 2025

pablomlago commented Jan 7, 2025

Subset of modules with tied parameters being retrieved in get_module_size_with_ties #3308

Subset of modules with tied parameters being retrieved in get_module_size_with_ties #3308

Comments

pablomlago commented Dec 21, 2024

System Info

Information

Tasks

Reproduction

Expected behavior

BenjaminBossan commented Jan 6, 2025

pablomlago commented Jan 7, 2025