How to generate subwords from a raw file? #8

AMR-KELEG · 2021-11-16T11:15:32Z

Hi,

I am really excited about this package and I am just wondering if you can add some instructions for generating a subwords file from another raw file (one sentence per line).
Should I use the subword tokenization model for a multilingual BERT? I think using different tokenizers might affect the way common subwords are detected.
Thanks 😄

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How to generate subwords from a raw file? #8

How to generate subwords from a raw file? #8

AMR-KELEG commented Nov 16, 2021

How to generate subwords from a raw file? #8

How to generate subwords from a raw file? #8

Comments

AMR-KELEG commented Nov 16, 2021