Skip to content
This repository has been archived by the owner on Oct 18, 2021. It is now read-only.

Unknown words in the input: Predicting from the context #3

Open
bartvm opened this issue Dec 2, 2015 · 1 comment
Open

Unknown words in the input: Predicting from the context #3

bartvm opened this issue Dec 2, 2015 · 1 comment
Assignees
Labels

Comments

@bartvm
Copy link
Owner

bartvm commented Dec 2, 2015

One way of addressing unknown words in the input would be to predict the missing embedding from the context. This is effectively the same as saying that for these words we simply run a language model, and use that instead. Questions to look into:

  • What kind of language model? A simple language model, an RNN language model, or a bidirectional RNN language model
  • Whether or not we should pre-train this language model on monolingual data (it might spend a lot of capacity on learning words that are in the vocabulary, which is wasted capacity)

Reasons I think this could help:

  • Compared to getting embeddings from character level, this could give more sensible embeeddings to the encoder for e.g. proper nouns
  • Even if it only half-works, it's still better than feeding an UNK embedding, which could mean anything to the encoder (proper noun, rare word, typo, gibberish, etc.)
@anirudh9119
Copy link
Collaborator

I did a quick experiment to see, if the word embeddings from the context were actually helping or not, (which can also serve as a test to see if my implementation is correct or not)

I ran with and without the word embeddings from the context to see how much it helps, and I kept the vocabulary size to be 100, 500, 1000, 2000 and let the code run for 6 hours(both with and without contextual word embeddings) And the validation error was less with the contextual word embeddings when trained on Europarl only. (Difference was approximately 3%)

Interesting thing was difference b/w the validation error with and without contextual word embeddings was greater when I used dictionary size = 1000 as compared to 100 and 500, as expected.

So, Can I assume that word embeddings from the context is actually helping by this experiment ?

@JinseokNam JinseokNam reopened this Feb 3, 2016
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Projects
None yet
Development

No branches or pull requests

3 participants