Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Training data of model-based filtering #74

Open
Yu-Shi opened this issue Sep 6, 2024 · 3 comments
Open

Training data of model-based filtering #74

Yu-Shi opened this issue Sep 6, 2024 · 3 comments
Assignees

Comments

@Yu-Shi
Copy link

Yu-Shi commented Sep 6, 2024

Hi authors, I'm interested in researching the model-based filtering process and reproducing the training of the fasttext model in the DCLM-baseline. Could you please provide the training data for this?

@Yu-Shi
Copy link
Author

Yu-Shi commented Sep 6, 2024

Or, Can you provide more information on the construction of the training data, so that others can reproduce a similar model?

@Yu-Shi
Copy link
Author

Yu-Shi commented Oct 13, 2024

Hi authors, are there any updates on this? thank you

@afang-story
Copy link
Contributor

I don't think we can release the data, but we will update if this changes. You can find OpenHermes-2.5 on Hugging Face, and instructions for reproducing the ELI5 portion are in Appendix I.1 in the paper.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants