Merge branch 'main' into fix-grpo-logits-calc

huggingface · Jan 31, 2025 · 41f195f · 41f195f
2 parents 3332a22 + bf69191
commit 41f195f
Showing 1 changed file with 1 addition and 1 deletion.
diff --git a/README.md b/README.md
@@ -139,7 +139,7 @@ trainer.train()
 
 ### `GRPOTrainer`
 
-`GRPOTrainer` implements a [DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models]([https://huggingface.co/papers/2402.14740](https://huggingface.co/papers/2402.03300)) for reinforcement learning. Group Relative Policy Optimization (GRPO) is more performant than PPO and was used to train [Deepseek AI's R1](https://huggingface.co/deepseek-ai/DeepSeek-R1).
+`GRPOTrainer` implements the [Group Relative Policy Optimization (GRPO) algorithm](https://huggingface.co/papers/2402.03300) that is more memory-efficient than PPO and was used to train [Deepseek AI's R1](https://huggingface.co/deepseek-ai/DeepSeek-R1).
 
 ```python
 from datasets import load_dataset