Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Add DPO support for DeepSpeed-Chat (#828)
* Add label_smoothing while calculating step2 DPO loss in DeepSpeed-Chat. * Add training scripts for step2 DPO in DeepSpeed-Chat. * Remove unused packages and format the code of step2 DPO in DeepSpeed-Chat. * Update training scripts of step2 DPO in DeepSpeed-Chat. * Follow upstream fixes. * Update README.md for Step2 DPO finetuning. * Add opt 350M training log demo for step 2 dpo finetuning in DeepSpeed-Chat. * Address the formatting issue in step2 dpo finetuning in DeepSpeed-Chat. --------- Co-authored-by: Logan Adams <[email protected]> Co-authored-by: Olatunji Ruwase <[email protected]>
- Loading branch information