Maximum Mutual Information #8

anirudh9119 · 2016-01-08T22:01:04Z

There is a new paper http://arxiv.org/abs/1601.00372 Which tries to improve, NMT by improving Mutual information, i.e not just by modeling P(y|x) it also tries to model, P(x|y). Now, it is not possible to train these together, so the authors trained a separate seq2seq network ( 2networks 1 for each). They have shown around 1 BLEU score improvement.

When reading about SMT, I noticed that inclusion of this mutual information has a large impact (in terms of translation qualities also) and also quantitatively. I am kind of not sure why they were able to gain only 1 BLEU score, one simple extension could be to use the model P(x|y) during the attention mechanism.

bartvm · 2016-01-08T22:17:44Z

I forgot to add this as an issue I guess, but yes, training two models in either direction and then weighting the scores of the reverse translation is definitely something I think we can do. I originally got the idea from this very much related paper: http://arxiv.org/abs/1510.03055

I don't think the comparison with SMT is very straightforward, because the mutual information is normally used on a phrase-level and not on a sentence-level.

anirudh9119 added enhancement question and removed enhancement labels Jan 8, 2016

anirudh9119 added research and removed question labels Jan 8, 2016

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Maximum Mutual Information #8

Maximum Mutual Information #8

anirudh9119 commented Jan 8, 2016

bartvm commented Jan 8, 2016

Maximum Mutual Information #8

Maximum Mutual Information #8

Comments

anirudh9119 commented Jan 8, 2016

bartvm commented Jan 8, 2016