Skip to content

Commit

Permalink
fix doc
Browse files Browse the repository at this point in the history
  • Loading branch information
kashif committed Jan 17, 2025
1 parent b2f017f commit 67adbfe
Showing 1 changed file with 1 addition and 1 deletion.
2 changes: 1 addition & 1 deletion docs/source/rloo_trainer.md
Original file line number Diff line number Diff line change
Expand Up @@ -276,7 +276,7 @@ The [Reinforce++](https://hijkzzz.notion.site/reinforce-plus-plus) report by Jia
- Clipping rewards: limiting reward values within a specific range to mitigate the impact of extreme rewards on model updates, thus preventing gradient explosion
- Normalizing rewards: scaling rewards to have a mean of 0 and a standard deviation of 1, which helps in stabilizing the training process
- Normalizing advantages: scaling advantages to have a mean of 0 and a standard deviation of 1, which helps in stabilizing the training process
- Using token-level KL penalty that vs. sequence-level KL penalty (default)
- Using token-level KL penalty that is defined as equation (1) of the report vs. sequence-level KL penalty (default)

These options are available via the appropriate arguments in the [`RLOOConfig`] class.

Expand Down

0 comments on commit 67adbfe

Please sign in to comment.