Skip to content
This repository has been archived by the owner on Oct 18, 2021. It is now read-only.

Maximum Mutual Information #8

Open
anirudh9119 opened this issue Jan 8, 2016 · 1 comment
Open

Maximum Mutual Information #8

anirudh9119 opened this issue Jan 8, 2016 · 1 comment
Labels

Comments

@anirudh9119
Copy link
Collaborator

There is a new paper http://arxiv.org/abs/1601.00372 Which tries to improve, NMT by improving Mutual information, i.e not just by modeling P(y|x) it also tries to model, P(x|y). Now, it is not possible to train these together, so the authors trained a separate seq2seq network ( 2networks 1 for each). They have shown around 1 BLEU score improvement.

When reading about SMT, I noticed that inclusion of this mutual information has a large impact (in terms of translation qualities also) and also quantitatively. I am kind of not sure why they were able to gain only 1 BLEU score, one simple extension could be to use the model P(x|y) during the attention mechanism.

@bartvm
Copy link
Owner

bartvm commented Jan 8, 2016

I forgot to add this as an issue I guess, but yes, training two models in either direction and then weighting the scores of the reverse translation is definitely something I think we can do. I originally got the idea from this very much related paper: http://arxiv.org/abs/1510.03055

I don't think the comparison with SMT is very straightforward, because the mutual information is normally used on a phrase-level and not on a sentence-level.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Projects
None yet
Development

No branches or pull requests

2 participants