You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I am really excited about this package and I am just wondering if you can add some instructions for generating a subwords file from another raw file (one sentence per line).
Should I use the subword tokenization model for a multilingual BERT? I think using different tokenizers might affect the way common subwords are detected.
Thanks 😄
The text was updated successfully, but these errors were encountered:
Hi,
I am really excited about this package and I am just wondering if you can add some instructions for generating a subwords file from another raw file (one sentence per line).
Should I use the subword tokenization model for a multilingual BERT? I think using different tokenizers might affect the way common subwords are detected.
Thanks 😄
The text was updated successfully, but these errors were encountered: