What is the Aha moment in R1-V #128

ruolinsss · 2025-02-24T07:56:40Z

I tried to reproduce the repo, and everything went smoothly.

At first I used the dataset Clevr_CoGenT_TrainA_R1. However, I observed that when using this dataset, the format loss and completion length never increased.

Next, I switched to the Clevr_CoGenT_TrainA_70k_Complex and the loss curves appeared to be as expected. After reviewing the debug_log_2b.txt, the log somewhat matched the curve.

At ~30% of the steps:

At ~50% of the steps:

At ~100% of the steps:

I have some questions according to these two experiments:

I’m wondering if the differences in performance between the datasets align with expectations.
What exactly is the "aha moment" here? Should I be looking for something like ‘wait...’ to mark the breakthrough?
And based on my experiment with the Clevr_CoGenT_TrainA_70k_Complex dataset, should I use the 1000-step checkpoint (the one with the highest completion) for evaluation?

Thanks a lot !

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

What is the Aha moment in R1-V #128

What is the Aha moment in R1-V #128

ruolinsss commented Feb 24, 2025

What is the Aha moment in R1-V #128

What is the Aha moment in R1-V #128

Comments

ruolinsss commented Feb 24, 2025