You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I tried to reproduce the repo, and everything went smoothly.
At first I used the dataset Clevr_CoGenT_TrainA_R1. However, I observed that when using this dataset, the format loss and completion length never increased.
Next, I switched to the Clevr_CoGenT_TrainA_70k_Complex and the loss curves appeared to be as expected. After reviewing the debug_log_2b.txt, the log somewhat matched the curve.
At ~30% of the steps:
At ~50% of the steps:
At ~100% of the steps:
I have some questions according to these two experiments:
I’m wondering if the differences in performance between the datasets align with expectations.
What exactly is the "aha moment" here? Should I be looking for something like ‘wait...’ to mark the breakthrough?
And based on my experiment with the Clevr_CoGenT_TrainA_70k_Complex dataset, should I use the 1000-step checkpoint (the one with the highest completion) for evaluation?
Thanks a lot !
The text was updated successfully, but these errors were encountered:
I tried to reproduce the repo, and everything went smoothly.
At first I used the dataset Clevr_CoGenT_TrainA_R1. However, I observed that when using this dataset, the format loss and completion length never increased.
Next, I switched to the Clevr_CoGenT_TrainA_70k_Complex and the loss curves appeared to be as expected. After reviewing the debug_log_2b.txt, the log somewhat matched the curve.
At ~30% of the steps:
At ~50% of the steps:
At ~100% of the steps:
I have some questions according to these two experiments:
Thanks a lot !
The text was updated successfully, but these errors were encountered: