You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hello! Our team has been working on reproducing SkyThought, which we find to be groundbreaking and insightful. We have successfully replicated the results using the Qwen2.5-32B model based on the llama-factory code, achieving significant performance improvements.
However, we encountered some challenges when attempting the same training with the llama3.3-70B model. Contrary to our expectations, we did not observe a notable performance boost, and there was even a slight decline in performance on the math500 benchmark.
We greatly appreciate any guidance or insights. Thank you!
The text was updated successfully, but these errors were encountered:
Have you tested eval scripts? issue I can't reproduce Qwen/QwQ-32B-Preview accuracy on AIME with eval scripts? Can you reproduce Qwen/QwQ-32B-Preview accuracy on AIME with eval scripts?
Hello! Our team has been working on reproducing SkyThought, which we find to be groundbreaking and insightful. We have successfully replicated the results using the Qwen2.5-32B model based on the llama-factory code, achieving significant performance improvements.
However, we encountered some challenges when attempting the same training with the llama3.3-70B model. Contrary to our expectations, we did not observe a notable performance boost, and there was even a slight decline in performance on the math500 benchmark.
We greatly appreciate any guidance or insights. Thank you!
The text was updated successfully, but these errors were encountered: