Run r1 evals for `algorithmic` datasets #221

joesharratt1229 · 2025-02-26T12:27:36Z

Given the latency of evaluating R1 comparative to other datasets, it makes sense to segment this into running evaluations by category type. This sub-issue relating to running R1 evals for datasets of the algorithmic category

The text was updated successfully, but these errors were encountered:

joesharratt1229 added the evaluations label Feb 26, 2025

joesharratt1229 self-assigned this Feb 26, 2025

joesharratt1229 changed the title ~~Run r1 evals for algorithmic dataset~~ Run r1 evals for algorithmic datasets Feb 26, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Run r1 evals for `algorithmic` datasets #221

Run r1 evals for `algorithmic` datasets #221

joesharratt1229 commented Feb 26, 2025 •

edited

Loading

Run r1 evals for algorithmic datasets #221

Run r1 evals for algorithmic datasets #221

Comments

joesharratt1229 commented Feb 26, 2025 • edited Loading

Run r1 evals for `algorithmic` datasets #221

Run r1 evals for `algorithmic` datasets #221

joesharratt1229 commented Feb 26, 2025 •

edited

Loading