Fixing GRPO `reward_func` being a model with DeepSpeed ZeRO-3 #2984

jamesbraza · 2025-02-28T06:48:04Z

This PR enables GRPOTrainer's reward_func model to work with DeepSpeed ZeRO-3.

Running the new test with current main:

torchrun --nproc_per_node=1 -m pytest -sv tests/test_grpo_trainer.py::GRPOTrainerTester::test_training_with_deepspeed_zero3

tests/test_grpo_trainer.py:346:
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
.venv/lib/python3.12/site-packages/transformers/trainer.py:2241: in train
    return inner_training_loop(
.venv/lib/python3.12/site-packages/transformers/trainer.py:2548: in _inner_training_loop
    tr_loss_step = self.training_step(model, inputs, num_items_in_batch)
.venv/lib/python3.12/site-packages/transformers/trainer.py:3692: in training_step
    inputs = self._prepare_inputs(inputs)
trl/extras/profiling.py:87: in wrapper
    return func(self, *args, **kwargs)
trl/trainer/grpo_trainer.py:692: in _prepare_inputs
    inputs = self._generate_and_score_completions(inputs)
trl/trainer/grpo_trainer.py:833: in _generate_and_score_completions
    rewards_per_func[:, i] = reward_func(**reward_inputs).logits[:, 0]  # Shape (B*G,)
.venv/lib/python3.12/site-packages/torch/nn/modules/module.py:1736: in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
.venv/lib/python3.12/site-packages/torch/nn/modules/module.py:1747: in _call_impl
    return forward_call(*args, **kwargs)
.venv/lib/python3.12/site-packages/transformers/models/qwen2/modeling_qwen2.py:945: in forward
    transformer_outputs = self.model(
.venv/lib/python3.12/site-packages/torch/nn/modules/module.py:1736: in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
.venv/lib/python3.12/site-packages/torch/nn/modules/module.py:1747: in _call_impl
    return forward_call(*args, **kwargs)
.venv/lib/python3.12/site-packages/transformers/models/qwen2/modeling_qwen2.py:535: in forward
    inputs_embeds = self.embed_tokens(input_ids)
.venv/lib/python3.12/site-packages/torch/nn/modules/module.py:1736: in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
.venv/lib/python3.12/site-packages/torch/nn/modules/module.py:1747: in _call_impl
    return forward_call(*args, **kwargs)
.venv/lib/python3.12/site-packages/torch/nn/modules/sparse.py:190: in forward
    return F.embedding(

...

E       RuntimeError: 'weight' must be 2-D

.venv/lib/python3.12/site-packages/torch/nn/functional.py:2551: RuntimeError

jamesbraza · 2025-02-28T07:05:09Z

Is there a standard solution for DeepSpeed tests in CI? I think this is the first integration test for DeepSpeed added to the repo.

In the future, we can expand it to cover #2871 and #2963.

qgallouedec · 2025-02-28T14:19:17Z

trl/trainer/grpo_trainer.py

+                    with torch.inference_mode(), unwrap_model_for_generation(
+                        reward_func, self.accelerator
+                    ) as unwrapped_reward_func:


Why do you need to unwrap here? it seems like your loosing interest of using deepspeed

Thanks for the question, yeah without this unwrap the unit test will crash with the stack trace in the PR description.

Try running torchrun --nproc_per_node=1 -m pytest -sv tests/test_grpo_trainer.py::GRPOTrainerTester::test_training_with_deepspeed_zero3 with this PR's unit test on main and then with the unwrap.

Wrapping reward model in unwrap_model_for_generation, with test

9ed0fee

qgallouedec reviewed Feb 28, 2025

View reviewed changes

jamesbraza changed the title ~~Allowing GRPO reward_func to be a model with DeepSpeed ZeRO-3~~ Fixing GRPO reward_func being a model with DeepSpeed ZeRO-3 Feb 28, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fixing GRPO `reward_func` being a model with DeepSpeed ZeRO-3 #2984

Fixing GRPO `reward_func` being a model with DeepSpeed ZeRO-3 #2984

jamesbraza commented Feb 28, 2025

jamesbraza commented Feb 28, 2025

qgallouedec Feb 28, 2025

jamesbraza Feb 28, 2025

Fixing GRPO reward_func being a model with DeepSpeed ZeRO-3 #2984

Are you sure you want to change the base?

Fixing GRPO reward_func being a model with DeepSpeed ZeRO-3 #2984

Conversation

jamesbraza commented Feb 28, 2025

jamesbraza commented Feb 28, 2025

qgallouedec Feb 28, 2025

Choose a reason for hiding this comment

jamesbraza Feb 28, 2025

Choose a reason for hiding this comment

Fixing GRPO `reward_func` being a model with DeepSpeed ZeRO-3 #2984

Fixing GRPO `reward_func` being a model with DeepSpeed ZeRO-3 #2984