-
Notifications
You must be signed in to change notification settings - Fork 19
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Where to report when a word is missing from the dictionary? #86
Comments
Another example is ようこそ. |
Also 方 (かた) in the meaning "person". |
I used the words from JMdict for kakugo. The issue is that that dictionary is HUGE, I had to filter them. I chose to take only words that were "ichi1"
(source) And even just that gives a lot of entries. It's an arbitrary choice, but it's hard to find a criteria to keep only words useful for a learner... |
要る is also missing. I think, there is something wrong with your method of filtering the dictionary. |
Out of curiosity I checked the given words in the latest JMDict english version I could at the source URL above. ようこそ is listed as "ichi1" so not sure what happened there. Maybe the tags were different in the older JMDict. The tags for 彼氏 are news2, nf36, spec2 so it doesn't seem to be that popular. nf36 indicates it's in the top 36000 words 方 is ichi1 but the translations are "direction, way" Tags for 要る are news2, nf27, spec1 so it makes sense why it's missing. A case could be made for adding words that have spec1 tags and those with with nf01 to nf10+ if they're not in the existing word list.
|
Hello, I'm posting here as it kind of join this issue of "which data to use" The Kanji 和 is listed in N3, but it should actually be in N1 according to jisho https://jisho.org/search/%E5%92%8C |
@blastrock: Is the code you used to generate the dictionary also open-source? I poked around this repo and your other repos but I couldn't find them. I would really like to adapt it in a fork to generate a new dictionary which includes a lot more vocab. I know you would like to reduce the size as there are many "useless" words, but there are also many that I'm missing. I see the dictionary is in a gzipped sqlite db, but I'm hoping that I don't have to write my own scripts to add more vocab. I would just like to modify the filter you used. I started out learning kanji and vocab just using Kakugo and I think it's by far the best app out there (thanks so much!). But now I'm attending Japanese classes and I realize that the book they use (いろどり, which is free) requires me to learn a lot of vocab that is missing. A couple of examples just from the current chapter:
If whatever scripts or code you used to generate the dictionary is open source, I can adapt it and make my own fork. I'd be happy to make pull-requests for any additional features I might also work on for myself (for example, I might add a different heuristic for auto-selecting vocab based on kanji, as the existing one selects 1000s of vocab words once you know a few 100 kanji). |
The script to generate the dictionary is not open source because it is quite ugly. I don't mind sharing it with a few people though. I'll push it to a private repo and add you to it if you want. |
@blastrock, it would really be great if you could grant me access to that script. Thank you! |
Done. The repo is in a poor state, don't hesitate to email me if you have any question. |
Thank you, I'll report back! |
I recently worked on this. In the latest release, all ichi1 and news1 words are included. For each word, I included multiple translations (kind of like kanji test). Also, it is now possible to show words that are usually written in kana actually in kanji, like 下さい and many others. This doesn't completely solve this issue, but greatly improves things I think. |
For example I cannot find the word 彼氏 - a common word in Japanese language, but this is not the first occasion when this happens.
The text was updated successfully, but these errors were encountered: