Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

What is the Aha moment in R1-V #128

Open
ruolinsss opened this issue Feb 24, 2025 · 0 comments
Open

What is the Aha moment in R1-V #128

ruolinsss opened this issue Feb 24, 2025 · 0 comments

Comments

@ruolinsss
Copy link

I tried to reproduce the repo, and everything went smoothly.

At first I used the dataset Clevr_CoGenT_TrainA_R1. However, I observed that when using this dataset, the format loss and completion length never increased.

Image

Next, I switched to the Clevr_CoGenT_TrainA_70k_Complex and the loss curves appeared to be as expected. After reviewing the debug_log_2b.txt, the log somewhat matched the curve.

Image

At ~30% of the steps:

Image

At ~50% of the steps:

Image

At ~100% of the steps:

Image

I have some questions according to these two experiments:

  1. I’m wondering if the differences in performance between the datasets align with expectations.
  2. What exactly is the "aha moment" here? Should I be looking for something like ‘wait...’ to mark the breakthrough?
  3. And based on my experiment with the Clevr_CoGenT_TrainA_70k_Complex dataset, should I use the 1000-step checkpoint (the one with the highest completion) for evaluation?

Thanks a lot !

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant