Training data of model-based filtering #74

Yu-Shi · 2024-09-06T08:42:07Z

Hi authors, I'm interested in researching the model-based filtering process and reproducing the training of the fasttext model in the DCLM-baseline. Could you please provide the training data for this?

Yu-Shi · 2024-09-06T08:44:13Z

Or, Can you provide more information on the construction of the training data, so that others can reproduce a similar model?

Yu-Shi · 2024-10-13T20:08:48Z

Hi authors, are there any updates on this? thank you

afang-story · 2024-10-31T01:34:21Z

I don't think we can release the data, but we will update if this changes. You can find OpenHermes-2.5 on Hugging Face, and instructions for reproducing the ELI5 portion are in Appendix I.1 in the paper.

Mivg self-assigned this Sep 7, 2024

Mivg mentioned this issue Sep 14, 2024

The dataset for training fastText OH-2.5 +ELI5 text classifier #75

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Training data of model-based filtering #74

Training data of model-based filtering #74

Yu-Shi commented Sep 6, 2024

Yu-Shi commented Sep 6, 2024

Yu-Shi commented Oct 13, 2024

afang-story commented Oct 31, 2024

Training data of model-based filtering #74

Training data of model-based filtering #74

Comments

Yu-Shi commented Sep 6, 2024

Yu-Shi commented Sep 6, 2024

Yu-Shi commented Oct 13, 2024

afang-story commented Oct 31, 2024