training_rewards_accuracies曲线计算问题 #6539
XueyangFeng
started this conversation in
General
Replies: 1 comment
-
pi(y_chosen)>pi(y_rejected) |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
请问dpo,kto训练后的training_rewards_accuracies是怎么算出来的呀,pi(y_chosen)/pi_ref(y_chosen)吗
Beta Was this translation helpful? Give feedback.
All reactions