Word-dependent cost function #2

bartvm · 2015-12-02T21:01:52Z

This paper is the only one that I am aware of which has looked at the differences between NMT and SMT in depth. It reaches some interesting conclusions:

[...] we found that the majority of the gains [after reranking] were related to improvements in the accuracy of transfer of correct grammatical structure to the target sentence [but] neural MT reranking had an overall negative effect on choice of terminology [...] the neural MT model tended to prefer more common words, mistaking “radiant heat” as “radiation heat” or “slipring” as “ring.” While these tendencies will be affected by many factors such as the size of the vocabulary or the number and size of hidden layers of the net, we feel it is safe to say that neural MT reranking can be expected to have a large positive effect on syntactic correctness of output, while results for lexical choice are less conclusive.

These results were for reranking of SMT sentences using an NMT model, so it's hard to say whether these problems are more/less present when doing direct translation (my guess would be more).

A simple way of trying to address this problem this could be to weight the target words using something like TF-IDF (which is basically TF if we take each sentence to be a document). This would force the model to pay more attention to the rare words, not getting away as easily by optimising for common words.

Evaluating whether this helps could be tricky; standard perplexity and BLEU score won't easily show an improvement in these rarer failure cases, so it could be worthwhile to evaluate using e.g. NIST instead.

bartvm added the research label Dec 2, 2015

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Word-dependent cost function #2

Word-dependent cost function #2

bartvm commented Dec 2, 2015

Word-dependent cost function #2

Word-dependent cost function #2

Comments

bartvm commented Dec 2, 2015