Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add _compute_score method to PPOTrainer #2560

Draft
wants to merge 5 commits into
base: main
Choose a base branch
from

Conversation

oliveiraeliel
Copy link

@oliveiraeliel oliveiraeliel commented Jan 11, 2025

What does this PR do?

This PR aims to decouple the score computation logic from the train method in PPOTrainer by adding a _compute_score method.

The train method is currently very large and encompasses a lot of responsibilities. This makes it difficult to customize, especially if you need to make a simple change to the score computation logic, as discussed in #2518. In such cases, you would be forced to override both the train and generate_completions methods just to modify a few lines of code.

To address this issue, I propose starting the process of decoupling the train method. The newly introduced _compute_score method encapsulates the same logic found in both train and generate_completions for computing the scores, ensuring that the default behavior of the class remains unchanged.

With this approach, if someone wants to implement a custom reward logic, they only need to override a small, focused method, making the code easier to extend and maintain.

Before submitting

Who can review?

Anyone in the community is free to review the PR once the tests have passed. Feel free to tag
members/contributors who may be interested in your PR.

oliveiraeliel and others added 2 commits January 11, 2025 14:05
- Refactors the score computation logic by decoupling it from the `train()` method.
- Enables easier customization of reward logic by overriding the `_compute_score` method.
@oliveiraeliel oliveiraeliel changed the title feat(PPOTrainer): add _compute_score method Add _compute_score method to PPOTrainer Jan 11, 2025
@oliveiraeliel oliveiraeliel marked this pull request as ready for review January 11, 2025 15:12
@oliveiraeliel oliveiraeliel marked this pull request as draft January 11, 2025 15:16
@oliveiraeliel
Copy link
Author

Please, can someone give me some feedback? It is my first PR to trl

@qgallouedec
Copy link
Member

Nice! just make sure to run make precommit to apply the right style

ruff.....................................................................�[42mPassed�[m
ruff-format..............................................................�[42mPassed�[m
python scripts/add_copyrights.py
Checking 156 Python files for copyright notice...
✅ All files have the required copyright.
@oliveiraeliel
Copy link
Author

oliveiraeliel commented Jan 12, 2025

Nice! just make sure to run make precommit to apply the right style

I ran the make precommit and pytest test/test_ppo_trainer.py, everything looks ok.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants