You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Modern texts especially business documents contain bullet-like symbols e. g. for lists. Also middle dot is used with some frequency. While the recognition results for eng and deu are nearly perfect, the results for these symbols are "random".
For a next release of trained models the training data should be improved in this direction and maybe other symbols as well.
Test image:
Tesseract result with -l eng:
List of vehicles:
* Trucks
* vans
* bicycles
Liste von Fahrzeugen:
e Lastwagen
e Transporter
e Fahrrader
Result with -l deu:
List of vehicles:
« Trucks
« vans
+ bicycles
Liste von Fahrzeugen:
e Lastwagen
e Transporter
e Fahrräder
The text was updated successfully, but these errors were encountered:
Modern texts especially business documents contain bullet-like symbols e. g. for lists. Also middle dot is used with some frequency. While the recognition results for
eng
anddeu
are nearly perfect, the results for these symbols are "random".For a next release of trained models the training data should be improved in this direction and maybe other symbols as well.
Test image:
Tesseract result with
-l eng
:Result with
-l deu
:The text was updated successfully, but these errors were encountered: