We read every piece of feedback, and take your input very seriously.
To see all available qualifiers, see our documentation.
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
MiniCPM技术报告中提到“在预训练阶段只使用通用、量大的预训练粗质量数据,而在退火阶段,使用非常广泛的高质量知识和能力数据以及SFT的高质量数据,混合入预训练数据进行退火。”
请问在MiniCPM3中是否采用了相同的训练方法?是否有尝试过在Stable阶段加入高质量数据(或者换成Cosine降低学习率)?
The text was updated successfully, but these errors were encountered:
No branches or pull requests
Feature request / 功能建议
MiniCPM技术报告中提到“在预训练阶段只使用通用、量大的预训练粗质量数据,而在退火阶段,使用非常广泛的高质量知识和能力数据以及SFT的高质量数据,混合入预训练数据进行退火。”
请问在MiniCPM3中是否采用了相同的训练方法?是否有尝试过在Stable阶段加入高质量数据(或者换成Cosine降低学习率)?
The text was updated successfully, but these errors were encountered: