training_rewards_accuracies曲线计算问题 #6539

XueyangFeng started this conversation in General

XueyangFeng
Jan 6, 2025

请问dpo，kto训练后的training_rewards_accuracies是怎么算出来的呀，pi(y_chosen)/pi_ref(y_chosen)吗

Replies: 1 comment

hiyouga
Jan 6, 2025
Maintainer

pi(y_chosen)>pi(y_rejected)

0 replies

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment