Model selection by doing translation and evaluation on the fly #55

JinseokNam · 2016-03-07T19:02:37Z

After the discussion in #52, we ended up with the need of model selection based on evaluation metrics such as BLEU on the fly, which allows us to save the disk space as well as to merge the evaluation process into the training script.

I'm looking at the way of non-blocking computation of evaluation scores using Timer where an evaluation script is executed regularly with a certain time interval.

Here is the sample output using a toy script.

starting...
At epoch 0, 0/10...
At epoch 0, 1/10...
At epoch 0, 2/10...
At epoch 0, 3/10...
At epoch 0, 4/10...
At epoch 0, 5/10...
At epoch 0, 6/10...
At epoch 0, 7/10...
Do validation !!
At epoch 0, 8/10...
At epoch 0, 9/10...
At epoch 1, 0/10...
At epoch 1, 1/10...
At epoch 1, 2/10...
At epoch 1, 3/10...
Validation is over
At epoch 1, 4/10...
At epoch 1, 5/10...
Do validation !!
At epoch 1, 6/10...
At epoch 1, 7/10...
At epoch 1, 8/10...
At epoch 1, 9/10...
At epoch 2, 0/10...
At epoch 2, 1/10...
Validation is over
At epoch 2, 2/10...
At epoch 2, 3/10...
Do validation !!
At epoch 2, 4/10...
At epoch 2, 5/10...
At epoch 2, 6/10...
At epoch 2, 7/10...
At epoch 2, 8/10...
At epoch 2, 9/10...
Validation is over
At epoch 3, 0/10...
At epoch 3, 1/10...
Do validation !!
At epoch 3, 2/10...
At epoch 3, 3/10...
At epoch 3, 4/10...
At epoch 3, 5/10...
At epoch 3, 6/10...
At epoch 3, 7/10...
Validation is over
At epoch 3, 8/10...
At epoch 3, 9/10...
Do validation !!
At epoch 4, 0/10...
At epoch 4, 1/10...
At epoch 4, 2/10...
^CTraceback (most recent call last):
File "timer_sample.py", line 52, in
train(max_epoch, rt)
File "timer_sample.py", line 43, in train
time.sleep(5) # your long-running job goes here...
KeyboardInterrupt
Validation is over

Also, I'm going to implement unknown word replacement by using alignment weights during translation.

Let me know if you have a better idea or an easier way to achieve such things.

The text was updated successfully, but these errors were encountered:

JinseokNam · 2016-03-21T18:19:28Z

I ran some experiments to check if computing BLEU on the fly (#61) and naive unknown word replacement (#60) work fine or not.

It seems like the BLEU score calculation works as intended though there's a possibility of bugs.
The unknown word replacement seems to be fine as well.

If we try to translate the following German sentence to English taken from newstest2013 as a validation set,

Es ist auch ein Risikofaktor für mehrere andere Krebsarten .
It is also a risk factor for a number of others .

the output of the model is

It is also a Risikofaktor of several other Krebsarten .

You may see that the translated sentence contains two German words, i.e., Risikofaktor and Krebarten, which were copied directly from the source sentence using alignment scores.

Due to the limit of vocabulary size, both German words are out of vocabulary, meaning that the input sentence is fed into the model as follows.

Es ist auch ein für mehrere andere .

Note that the original output from the model before UNK replacement would be the following.

It is also a of several other .

As you can see, this sort of naive UNK replacement may work pretty well if UNK tokens are proper noun and have same in both source and target languages.
However, it' not always the case.

JinseokNam added the enhancement label Mar 7, 2016

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Model selection by doing translation and evaluation on the fly #55

Model selection by doing translation and evaluation on the fly #55

JinseokNam commented Mar 7, 2016

JinseokNam commented Mar 21, 2016

Model selection by doing translation and evaluation on the fly #55

Model selection by doing translation and evaluation on the fly #55

Comments

JinseokNam commented Mar 7, 2016

JinseokNam commented Mar 21, 2016