Add _compute_score
method to PPOTrainer
#2560
Draft
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
What does this PR do?
This PR aims to decouple the score computation logic from the
train
method inPPOTrainer
by adding a_compute_score
method.The
train
method is currently very large and encompasses a lot of responsibilities. This makes it difficult to customize, especially if you need to make a simple change to the score computation logic, as discussed in #2518. In such cases, you would be forced to override both thetrain
andgenerate_completions
methods just to modify a few lines of code.To address this issue, I propose starting the process of decoupling the
train
method. The newly introduced_compute_score
method encapsulates the same logic found in bothtrain
andgenerate_completions
for computing the scores, ensuring that the default behavior of the class remains unchanged.With this approach, if someone wants to implement a custom reward logic, they only need to override a small, focused method, making the code easier to extend and maintain.
Before submitting
Pull Request section?
to it if that's the case. ([question] best way to have my own reward model which is backed by rules #2518)
documentation guidelines.
Who can review?
Anyone in the community is free to review the PR once the tests have passed. Feel free to tag
members/contributors who may be interested in your PR.