Updated model.py #7

srijan-mishra · 2017-05-09T08:33:41Z

The disambiguate task was working on a the individual letters of the words of the whole sentence. I have changed it now and it would be running the word2id on the words as a whole rather than the individual letters of the words.

lopuhin · 2017-05-09T08:36:55Z

@srijan-mishra ah, thanks for the PR! Actually, I should have made it more clear that context should be already tokenized. I would like to leave tokenization outside of "disambiguate", as it can be different for different languages, and it's important for tokenization to be the same in training and inference.

So instead of doing tokenization via split, I would like to document that context should already be tokenized.

srijan-mishra · 2017-05-09T08:38:21Z

Awesome!

Great work here :)

lopuhin · 2017-05-09T08:45:11Z

@srijan-mishra great work with figuring it out!

Btw, just though that it would be also possible to raise an error if a string is passed instead of a list, so if someone misses the docs, they'll get a clear error message.

Update model.py

98c3e75

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Updated model.py #7

Updated model.py #7

srijan-mishra commented May 9, 2017

lopuhin commented May 9, 2017

srijan-mishra commented May 9, 2017

lopuhin commented May 9, 2017

Updated model.py #7

Are you sure you want to change the base?

Updated model.py #7

Conversation

srijan-mishra commented May 9, 2017

lopuhin commented May 9, 2017

srijan-mishra commented May 9, 2017

lopuhin commented May 9, 2017