You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
{{ message }}
This repository has been archived by the owner on Oct 18, 2021. It is now read-only.
There is a new paper http://arxiv.org/abs/1601.00372 Which tries to improve, NMT by improving Mutual information, i.e not just by modeling P(y|x) it also tries to model, P(x|y). Now, it is not possible to train these together, so the authors trained a separate seq2seq network ( 2networks 1 for each). They have shown around 1 BLEU score improvement.
When reading about SMT, I noticed that inclusion of this mutual information has a large impact (in terms of translation qualities also) and also quantitatively. I am kind of not sure why they were able to gain only 1 BLEU score, one simple extension could be to use the model P(x|y) during the attention mechanism.
The text was updated successfully, but these errors were encountered:
I forgot to add this as an issue I guess, but yes, training two models in either direction and then weighting the scores of the reverse translation is definitely something I think we can do. I originally got the idea from this very much related paper: http://arxiv.org/abs/1510.03055
I don't think the comparison with SMT is very straightforward, because the mutual information is normally used on a phrase-level and not on a sentence-level.
There is a new paper http://arxiv.org/abs/1601.00372 Which tries to improve, NMT by improving Mutual information, i.e not just by modeling P(y|x) it also tries to model, P(x|y). Now, it is not possible to train these together, so the authors trained a separate seq2seq network ( 2networks 1 for each). They have shown around 1 BLEU score improvement.
When reading about SMT, I noticed that inclusion of this mutual information has a large impact (in terms of translation qualities also) and also quantitatively. I am kind of not sure why they were able to gain only 1 BLEU score, one simple extension could be to use the model P(x|y) during the attention mechanism.
The text was updated successfully, but these errors were encountered: