You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
That's not an issue of tesseract, but of the model which does not include the GREEK LUNATE SIGMA SYMBOL (see unicharsets for grc and script/Greek). Therefore I move this issue to langdata_lstm.
Thanks for moving it. If I understand you correctly, the fact that I’m seeing the regular sigmas σ (when non-final) and ς (when final) in the OCR text whenever a lunate sigma ϲ is present in the image isn’t because the lunate sigma gets actually recognised as a sigma, but rather just because ϲ looks similar to σ/ς.
Current Behavior
A lunate sigma (ϲ, U+03F2) is recognised under language ‘grc’ but is being output as a normal sigma (σς).
Expected Behavior
Outputting it as U+03F2.
Suggested Fix
No response
tesseract -v
5.3.0-6-g76ae
Operating System
No response
Other Operating System
No response
uname -a
No response
Compiler
No response
CPU
No response
Virtualization / Containers
No response
Other Information
No response
The text was updated successfully, but these errors were encountered: