You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hi, I noticed that for br, cy, mt, and ga (ru is fine), the translated multi-lingual sentences tend to be one sentence shorter than the original English ones. For example, one datapoint from the br training set:
<lex comment="good" lang="br" lid="Id1">E Saldevanahalli, Acharya Dr. Sarvapalli Radharrishnan Road, Hessarghatta Main Road, Bangalore - 560090 a zo lec'hiadur ar Institouenn-Tekoloù Acharya a-seizet e stad Karnataka, Indez e 2000.</lex>
<lex comment="good" lang="br" lid="Id2">Krouiñ e 2000 e c'hastouenn an arlañv an taktoniñ e 2000 e c'eo e c'hastral Saldevanahalli, Acharya e Dr. Sarvapalli Radhakrishnan Road, Hessarghatta Main Road, Bangalore, Karnataka, Indi, 560090.</lex>
<lex comment="good" lang="br" lid="Id3">Ar c'hwec'h a zo bet krouiñ e 2000 a zo bet ar Stitankad an takeladoù Acharya (moto : "Derezh-eñvoudur") ha e plijout e Soldevanahalli, Acharya e Dr. Sarvapalli Radhakrishnan Road, Hessarghatta Main Road, Bangalore - 560090, Karnataka, Indezia.</lex>
<lex comment="" lang="en" lid="Id4">In Soldevanahalli, Acharya Dr. Sarvapalli Radhakrishnan Road, Hessarghatta Main Road, Bangalore – 560090 is the location of the Acharya Institute of Technology established in the state of Karnataka, India in the year 2000. The Institute, whose motto is "Nurturing Excellence" is affiliated with the Visvesvaraya Technological University in the city of Belgaum.</lex>
<lex comment="" lang="en" lid="Id4">The Acharya Institute of Technology was established in 2000. Its campus is located in Soldevanahalli, Acharya Dr. Sarvapalli Radhakrishnan Road, Hessarghatta Main Road, Bangalore, Karnataka, India, 560090. It is motto is "Nurturing Excellence" and it is affiliated with the Visvesvaraya Technological University in Belgaum.</lex>
<lex comment="" lang="en" lid="Id4">Acharya Institute of Technology (motto: "Nurturing Excellence") was established in 2000 and is located at Soldevanahalli, Acharya Dr. Sarvapalli Radhakrishnan Road, Hessarghatta Main Road, Bangalore – 560090, Karnataka, India. The Institute is affiliated with Visvesvaraya Technological University of Belgaum.</lex>
For the examples containing more that 3 triples as input, the percentage of cases that are missing sentences are:
Language
Percentage
BR
46.03%
CY
40.27%
GA
53.09%
MT
56%
RU
12.2%
I Do you mind to give them a check? Cheers : )
The text was updated successfully, but these errors were encountered:
Thank you for reaching out. I had not previously noticed this phenomenon in the training data, but I think this is likely just a natural result of the multilingual NMT system we use to translate from English to the new low-resource languages (br, cy, ga, mt). Performance is quite bad for these languages and so it wouldn't surprise me if decoding stops too early for the longer sequences.
Be aware that this silver training data we provide is likely to be very noisy and we only include it as an optional starting point. If you can think of alternative sources/methods to produce higher quality training data for your system we highly encourage you to do so. Feel free to let me know if you have any further questions.
Hi, I noticed that for br, cy, mt, and ga (ru is fine), the translated multi-lingual sentences tend to be one sentence shorter than the original English ones. For example, one datapoint from the br training set:
For the examples containing more that 3 triples as input, the percentage of cases that are missing sentences are:
I Do you mind to give them a check? Cheers : )
The text was updated successfully, but these errors were encountered: