Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fixing GRPO reward_func being a model with DeepSpeed ZeRO-3 #2984

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

jamesbraza
Copy link
Contributor

This PR enables GRPOTrainer's reward_func model to work with DeepSpeed ZeRO-3.

Running the new test with current main:

torchrun --nproc_per_node=1 -m pytest -sv tests/test_grpo_trainer.py::GRPOTrainerTester::test_training_with_deepspeed_zero3
tests/test_grpo_trainer.py:346:
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
.venv/lib/python3.12/site-packages/transformers/trainer.py:2241: in train
    return inner_training_loop(
.venv/lib/python3.12/site-packages/transformers/trainer.py:2548: in _inner_training_loop
    tr_loss_step = self.training_step(model, inputs, num_items_in_batch)
.venv/lib/python3.12/site-packages/transformers/trainer.py:3692: in training_step
    inputs = self._prepare_inputs(inputs)
trl/extras/profiling.py:87: in wrapper
    return func(self, *args, **kwargs)
trl/trainer/grpo_trainer.py:692: in _prepare_inputs
    inputs = self._generate_and_score_completions(inputs)
trl/trainer/grpo_trainer.py:833: in _generate_and_score_completions
    rewards_per_func[:, i] = reward_func(**reward_inputs).logits[:, 0]  # Shape (B*G,)
.venv/lib/python3.12/site-packages/torch/nn/modules/module.py:1736: in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
.venv/lib/python3.12/site-packages/torch/nn/modules/module.py:1747: in _call_impl
    return forward_call(*args, **kwargs)
.venv/lib/python3.12/site-packages/transformers/models/qwen2/modeling_qwen2.py:945: in forward
    transformer_outputs = self.model(
.venv/lib/python3.12/site-packages/torch/nn/modules/module.py:1736: in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
.venv/lib/python3.12/site-packages/torch/nn/modules/module.py:1747: in _call_impl
    return forward_call(*args, **kwargs)
.venv/lib/python3.12/site-packages/transformers/models/qwen2/modeling_qwen2.py:535: in forward
    inputs_embeds = self.embed_tokens(input_ids)
.venv/lib/python3.12/site-packages/torch/nn/modules/module.py:1736: in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
.venv/lib/python3.12/site-packages/torch/nn/modules/module.py:1747: in _call_impl
    return forward_call(*args, **kwargs)
.venv/lib/python3.12/site-packages/torch/nn/modules/sparse.py:190: in forward
    return F.embedding(

...

E       RuntimeError: 'weight' must be 2-D

.venv/lib/python3.12/site-packages/torch/nn/functional.py:2551: RuntimeError

@jamesbraza
Copy link
Contributor Author

Is there a standard solution for DeepSpeed tests in CI? I think this is the first integration test for DeepSpeed added to the repo.

In the future, we can expand it to cover #2871 and #2963.

Comment on lines +832 to +834
with torch.inference_mode(), unwrap_model_for_generation(
reward_func, self.accelerator
) as unwrapped_reward_func:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do you need to unwrap here? it seems like your loosing interest of using deepspeed

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the question, yeah without this unwrap the unit test will crash with the stack trace in the PR description.

Try running torchrun --nproc_per_node=1 -m pytest -sv tests/test_grpo_trainer.py::GRPOTrainerTester::test_training_with_deepspeed_zero3 with this PR's unit test on main and then with the unwrap.

@jamesbraza jamesbraza changed the title Allowing GRPO reward_func to be a model with DeepSpeed ZeRO-3 Fixing GRPO reward_func being a model with DeepSpeed ZeRO-3 Feb 28, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants