English is the only language of IT community. I know how pain to learn english and remember the english words as a non-native speaker. Many talent engineers were kept out from community because lack of english skill. Most public IT english words books is not related to IT community ,or too academic. I try to create a basic english dictionary from community and for programmer/software engineer . The assumption is developers are able to :
-
[ x ] Read&Write posts in Stackoverflow.com
-
[ x ] Read hackernews
-
[ ] Read&Write comments and readme in github.com.
Source | Newest Post | Oldest Post | Row Count | Size |
---|---|---|---|---|
HackerNews comments | 2015-10-13 08:44:02 UTC | 2006-10-09 19:51:01 UTC | 8399417 | 3.41 GB |
HackerNews stories | 2015-10-13 08:44:34 UTC | 2006-10-09 18:21:51 UTC | 1959809 | 402.71 MB |
StackOverflow answers | 2019-09-01 05:22:21.463 UTC | 2008-08-01 13:16:49.127 UTC | 27665009 | 22.27 GB |
StackOverflow questions | 2019-09-01 05:23:41.743 UTC | 2008-08-03 21:38:52.623 UTC | 18154493 | 28.13 GB |
- [ x ] Select top 16000 most frequently used words from StackOverflow and HackerNews . Thanks to Google cloud public dataset and BigQuery ,it save me some days and coffees.
- [ x ] Select english words by words list
- [ x ] Exclude too simple english words by voa special and common 3000
- [ x ] Group the words inflection to root with nltk steam module.
- [ ] Fetch meaning from opted dictionary.
- [ ] Fetch example sentence from StackOverflow post.
- [ x ] Output markdown file.
- [ x ] Output anki package file.
readme.md is generated by script. Not directly edit this readme. Edit the python script and corpus text.