You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Given the latency of evaluating R1 comparative to other datasets, it makes sense to segment this into running evaluations by category type. This sub-issue relating to running R1 evals for datasets of the games category
The text was updated successfully, but these errors were encountered:
Given the latency of evaluating R1 comparative to other datasets, it makes sense to segment this into running evaluations by category type. This sub-issue relating to running R1 evals for datasets of the
games
categoryThe text was updated successfully, but these errors were encountered: