We read every piece of feedback, and take your input very seriously.
To see all available qualifiers, see our documentation.
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
RewardModel计算两个response之间的差异:end_ind的计算是通过end_ind = max(one_ind, two_ind)算的,为什么不是直接比较one_input_ids和two_input_ids差异的最后一个值,也就是check_divergence[-1]来获得。
end_ind = max(one_ind, two_ind)
one_input_ids
two_input_ids
check_divergence[-1]
The text was updated successfully, but these errors were encountered:
No branches or pull requests
RewardModel计算两个response之间的差异:end_ind的计算是通过
end_ind = max(one_ind, two_ind)
算的,为什么不是直接比较one_input_ids
和two_input_ids
差异的最后一个值,也就是check_divergence[-1]
来获得。The text was updated successfully, but these errors were encountered: