-
Notifications
You must be signed in to change notification settings - Fork 155
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
NO fas.unicharset and fas.xheights file for Persian Language #60
Comments
If you want to fix this issue, the fas.unicharset file can be extracted from the fas.traineddata. IIRC, the xheights file is not needed for LSTM training. |
Is this an issue? There is also no eng.xheights @AinazRafiei, why do you think that you need those files? |
Ah, yes, sorry. So what remains to be fixed? Or can this issue be closed? |
|
The unicharset files in language folder like eng and fas are very different from files outside the folders. unicharset files that are outside are much more completed and have lots of unichars in language like unichars in different fonts. Unicharset in language folders are files that generated during training with Tesseract dataset on a language.Its not useable when you want to train tesseract on your dataset because it is different from dataset Tesseract used . |
There are no fas.xheights and fas.unicharset file for Persian language.Without these data how can we train tesseract with LSTM on persian language.Coulde you please add them or guide how can we make them ?
The text was updated successfully, but these errors were encountered: