You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
{{ message }}
This repository has been archived by the owner on Oct 18, 2021. It is now read-only.
One way of addressing unknown words in the input would be to predict the missing embedding from the context. This is effectively the same as saying that for these words we simply run a language model, and use that instead. Questions to look into:
What kind of language model? A simple language model, an RNN language model, or a bidirectional RNN language model
Whether or not we should pre-train this language model on monolingual data (it might spend a lot of capacity on learning words that are in the vocabulary, which is wasted capacity)
Reasons I think this could help:
Compared to getting embeddings from character level, this could give more sensible embeeddings to the encoder for e.g. proper nouns
Even if it only half-works, it's still better than feeding an UNK embedding, which could mean anything to the encoder (proper noun, rare word, typo, gibberish, etc.)
The text was updated successfully, but these errors were encountered:
I did a quick experiment to see, if the word embeddings from the context were actually helping or not, (which can also serve as a test to see if my implementation is correct or not)
I ran with and without the word embeddings from the context to see how much it helps, and I kept the vocabulary size to be 100, 500, 1000, 2000 and let the code run for 6 hours(both with and without contextual word embeddings) And the validation error was less with the contextual word embeddings when trained on Europarl only. (Difference was approximately 3%)
Interesting thing was difference b/w the validation error with and without contextual word embeddings was greater when I used dictionary size = 1000 as compared to 100 and 500, as expected.
So, Can I assume that word embeddings from the context is actually helping by this experiment ?
One way of addressing unknown words in the input would be to predict the missing embedding from the context. This is effectively the same as saying that for these words we simply run a language model, and use that instead. Questions to look into:
Reasons I think this could help:
UNK
embedding, which could mean anything to the encoder (proper noun, rare word, typo, gibberish, etc.)The text was updated successfully, but these errors were encountered: