Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Process reward models #2241

Merged
merged 26 commits into from
Jan 29, 2025
Merged

Process reward models #2241

merged 26 commits into from
Jan 29, 2025

Conversation

SalmanMohammadi
Copy link
Contributor

@SalmanMohammadi SalmanMohammadi commented Jan 7, 2025

Fixing num_labels for reward models

  1. Reward models should be initialized with num_labels=1. I've added a num_labels field which is set in cfg_kwargs, rather than model_kwargs, as this field is parsed from config.num_labels in transformers. This also allows us to correctly initialize process reward models with num_labels=2.

Adding support for process reward model training

I've added support for the PRMTrainer from trl, and also for the appropriate dataset format. Please see a screenshot of a successful training run below.

image

Resulting in the following trained model: https://huggingface.co/smohammadi/Qwen2.5-3B-MathShepherd

@SalmanMohammadi SalmanMohammadi changed the title Setting num_labels for reward models Process reward models Jan 7, 2025
@SalmanMohammadi SalmanMohammadi marked this pull request as ready for review January 23, 2025 11:24
Copy link
Collaborator

@winglian winglian left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A couple of minor things, but should be good to go after.

src/axolotl/core/trainer_builder.py Show resolved Hide resolved
src/axolotl/utils/config/models/input/v0_4_1/__init__.py Outdated Show resolved Hide resolved
tests/e2e/test_process_reward_model_llama.py Outdated Show resolved Hide resolved
Copy link
Collaborator

@winglian winglian left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great to me. Thanks!

@winglian winglian merged commit 54dd7ab into main Jan 29, 2025
11 checks passed
@winglian winglian deleted the fix_reward_model branch January 29, 2025 05:08
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants