Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

grpo生成重复sample,hf已修复,需要同步到这边 #3143

Closed
MrToy opened this issue Feb 17, 2025 · 4 comments
Closed

grpo生成重复sample,hf已修复,需要同步到这边 #3143

MrToy opened this issue Feb 17, 2025 · 4 comments

Comments

@MrToy
Copy link
Contributor

MrToy commented Feb 17, 2025

Related pr https://github.com/huggingface/trl/pull/2824/files
Related issue huggingface/trl#2776 (comment)

@Jintao-Huang
Copy link
Collaborator

重新拉一下main分支代码,试试

@Jintao-Huang
Copy link
Collaborator

这块 应该已经同步了

@MrToy
Copy link
Contributor Author

MrToy commented Feb 17, 2025

https://github.com/modelscope/ms-swift/blob/main/swift/trainers/rlhf_trainer/grpo_trainer.py#L35
GRPOTrainer.init 这个函数被删除了hf的实现改用了自己的实现,没有办法同步trl库的fix
确少 set_seed(args.seed, device_specific=True) 这行代码

我fork了下仓库加上这行代码之后就没问题了

@MrToy MrToy mentioned this issue Feb 17, 2025
4 tasks
@Jintao-Huang
Copy link
Collaborator

好的 感谢

@MrToy MrToy closed this as completed Feb 17, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants