Add `_compute_score` method to `PPOTrainer` #2560

oliveiraeliel · 2025-01-11T15:05:47Z

What does this PR do?

This PR aims to decouple the score computation logic from the train method in PPOTrainer by adding a _compute_score method.

The train method is currently very large and encompasses a lot of responsibilities. This makes it difficult to customize, especially if you need to make a simple change to the score computation logic, as discussed in #2518. In such cases, you would be forced to override both the train and generate_completions methods just to modify a few lines of code.

To address this issue, I propose starting the process of decoupling the train method. The newly introduced _compute_score method encapsulates the same logic found in both train and generate_completions for computing the scores, ensuring that the default behavior of the class remains unchanged.

With this approach, if someone wants to implement a custom reward logic, they only need to override a small, focused method, making the code easier to extend and maintain.

Before submitting

This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
Did you read the contributor guideline,
Pull Request section?
Was this discussed/approved via a GitHub issue? Please add a link
to it if that's the case. ([question] best way to have my own reward model which is backed by rules #2518)
Did you make sure to update the documentation with your changes? Here are the
documentation guidelines.
Did you write any new necessary tests?

Who can review?

Anyone in the community is free to review the PR once the tests have passed. Feel free to tag
members/contributors who may be interested in your PR.

- Refactors the score computation logic by decoupling it from the `train()` method. - Enables easier customization of reward logic by overriding the `_compute_score` method.

oliveiraeliel · 2025-01-11T15:17:23Z

Please, can someone give me some feedback? It is my first PR to trl

qgallouedec · 2025-01-11T22:13:35Z

Nice! just make sure to run make precommit to apply the right style

ruff.....................................................................�[42mPassed�[m ruff-format..............................................................�[42mPassed�[m python scripts/add_copyrights.py Checking 156 Python files for copyright notice... ✅ All files have the required copyright.

oliveiraeliel · 2025-01-12T01:44:17Z

Nice! just make sure to run make precommit to apply the right style

I ran the make precommit and pytest test/test_ppo_trainer.py, everything looks ok.

oliveiraeliel and others added 2 commits January 11, 2025 14:05

feat(PPOTrainer): add _compute_score method

909b605

- Refactors the score computation logic by decoupling it from the `train()` method. - Enables easier customization of reward logic by overriding the `_compute_score` method.

Merge branch 'main' into PPO_compute_score_method

cafef3c

oliveiraeliel changed the title ~~feat(PPOTrainer): add _compute_score method~~ Add _compute_score method to PPOTrainer Jan 11, 2025

oliveiraeliel marked this pull request as ready for review January 11, 2025 15:12

oliveiraeliel marked this pull request as draft January 11, 2025 15:16

oliveiraeliel added 2 commits January 12, 2025 01:38

merging

c857078

Merge branch 'main' into PPO_compute_score_method

fc9ff48

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add `_compute_score` method to `PPOTrainer` #2560

Add `_compute_score` method to `PPOTrainer` #2560

oliveiraeliel commented Jan 11, 2025 •

edited

Loading

oliveiraeliel commented Jan 11, 2025

qgallouedec commented Jan 11, 2025

oliveiraeliel commented Jan 12, 2025 •

edited

Loading

Add _compute_score method to PPOTrainer #2560

Are you sure you want to change the base?

Add _compute_score method to PPOTrainer #2560

Conversation

oliveiraeliel commented Jan 11, 2025 • edited Loading

What does this PR do?

Before submitting

Who can review?

oliveiraeliel commented Jan 11, 2025

qgallouedec commented Jan 11, 2025

oliveiraeliel commented Jan 12, 2025 • edited Loading

Add `_compute_score` method to `PPOTrainer` #2560

Add `_compute_score` method to `PPOTrainer` #2560

oliveiraeliel commented Jan 11, 2025 •

edited

Loading

oliveiraeliel commented Jan 12, 2025 •

edited

Loading