Skip to content

Commit

Permalink
fix citation
Browse files Browse the repository at this point in the history
  • Loading branch information
kashif committed Jan 24, 2025
1 parent f8a33e3 commit 693bb4e
Showing 1 changed file with 6 additions and 5 deletions.
11 changes: 6 additions & 5 deletions trl/trainer/grpo_trainer.py
Original file line number Diff line number Diff line change
Expand Up @@ -583,11 +583,12 @@ def create_model_card(

citation = textwrap.dedent(
"""\
@article{zhihong2024deepseekmath,
title = {{DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models}},
author = {Zhihong Shao and Peiyi Wang and Qihao Zhu and Runxin Xu and Junxiao Song and Mingchuan Zhang and Y. K. Li and Y. Wu and Daya Guo},
year = 2024,
eprint = {arXiv:2402.03300},
@article{ramesh2024grpo,
title={Group Robust Preference Optimization in Reward-free RLHF},
author={Shyam Sundhar Ramesh, Iason Chaimalas, Viraj Mehta, Haitham Bou Ammar,
Pier Giuseppe Sessa, Yifan Hu, Ilija Bogunovic},
year={2024}
}
"""
)

Expand Down

0 comments on commit 693bb4e

Please sign in to comment.