-
Notifications
You must be signed in to change notification settings - Fork 21
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
abut train corpus format #10
Comments
Using the same processing steps as standard NMT systems is OK (e.g. tokenization and BPE for English) You may refer to the user manual at https://github.com/THUNLP-MT/THUMT for detail |
thank u very much |
My e-mail : [email protected] |
It seems that you only use one reference in validation when NIST test sets have 4 references. Using more references will result in higher BLEU scores. |
hello~
When I use this code to training a model, What format should be processed for the source corpus, the target corpus, the context corpus? are they tokenized and BPE? Could u send me a demo about it?
Thank u very mach.
The text was updated successfully, but these errors were encountered: