This project was created by me to scratch my own itch.
I love to watch movies and am always keen to expand my vocabulary.
But it's difficult to notice an unknown word during a movie without spoiling the experience.
That's where subvoc
comes in: search for a movie and discover its vocabulary.
Visit https://subvoc.stephanbehnke.com (hosted on Heroku, takes a few moments to start sometimes).
NOTE: The external API can be flaky - you can visit a cached analysis in this case.
To get a quick impression, here are some screenshots:
Homepage | Find Movie | List of words | Word details |
---|---|---|---|
When you select a movie, the OpenSubtitles API is queried for its subtitles. Then, the result is parsed, tokenized and analyzed sentence by sentence, word by word with the help of the Python Natural Language Toolkit. The difficulty of a word is determined by its relative frequency in the English language, assuming that more difficult words are simply used less.
- landing page with search bar
- search movie by query
- sort search results by popularity
- host on Heroku
- list of words sorted by difficulty
- use the base of each word
- lazy load analysis
- show movie context for each word
- include movie poster
- support for idioms
- support for TV show episodes
- show context in another language side by side
- wild idea: display YouTube videos with a certain word
(requires Docker)
- run server with
scripts/dev-py.sh
- build client
scripts/dev-js.sh
- run tests with
scripts/test-py.sh
andscripts/test-js.sh
MIT (see LICENSE).