Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

tesseract version #489

Closed
RJVB opened this issue May 27, 2021 · 5 comments
Closed

tesseract version #489

RJVB opened this issue May 27, 2021 · 5 comments

Comments

@RJVB
Copy link

RJVB commented May 27, 2021

Hi,

The build/install notes stipulate one should use version 3.04 of Tesseract, 3.04.00 if memory serves me well. I sprung for 3.04.01 which seems to work OK.
Are those instructions up to date, IOW, will audiveris really NOT work properly with newer Tesseract versions? The 3.04 version is quite old by now.

PS: Actually, there's only mention of the Tesseract language files - does that mean the Tesseract engine itself isn't requred or is that supposed to be pulled in as a dependency (as the MacPorts install command shown would)?

PS2: MacPorts port:tesseract is at 4.1.1 currently.

@hbitteur
Copy link
Contributor

@RJVB

I can confirm that Audiveris needs to use Tesseract language files related to old version 3.04.01. The newer 4.x versions don't work correctly with text found on music scores (problems with isolated lyric syllables, problems with text bounds).

And you are right: Tesseract engine is not used. You are part of the very few people that noticed that!
To be precise, Tesseract code is used, but via a linked library (e.g. under Windows we don't use a .exe but a .jar file + javacpp). You can inspect the details in Gradle dependencies.

@RJVB
Copy link
Author

RJVB commented May 29, 2021

Indeed, I see a reference to that extent, does it pull in a prebuilt libtesseract or does it use the one on the system?

@hbitteur
Copy link
Contributor

According to Gradle, it uses tesseract:3.04.01-1.3 and leptonica:1.73-1.3

I suppose, but I don't know the details, that these libraries are in a cache where they have been loaded some time before.
For example, from my user account on Windows, I can see these files:

$ find . -name "*tesseract*"
./.gradle/caches/modules-2/files-2.1/org.bytedeco.javacpp-presets/tesseract
./.gradle/caches/modules-2/files-2.1/org.bytedeco.javacpp-presets/tesseract/3.04.01-1.3/80b57bf68fcf851fbbc44c28e10cb4ad3036fe7b/tesseract-3.04.01-1.3.jar
./.gradle/caches/modules-2/files-2.1/org.bytedeco.javacpp-presets/tesseract/3.04.01-1.3/ee4eeb8cb480a063ac74c87c0102c865db8f474f/tesseract-3.04.01-1.3-windows-x86_64.jar
./.gradle/caches/modules-2/metadata-2.71/descriptors/org.bytedeco.javacpp-presets/tesseract
./.javacpp/cache/tesseract-3.04.01-1.3-windows-x86_64.jar
./.javacpp/cache/tesseract-3.04.01-1.3-windows-x86_64.jar/org/bytedeco/javacpp/windows-x86_64/jnitesseract.dll
./.javacpp/cache/tesseract-3.04.01-1.3-windows-x86_64.jar/org/bytedeco/javacpp/windows-x86_64/libtesseract-3.dll

@hbitteur hbitteur closed this as completed Aug 5, 2021
@stweil
Copy link
Contributor

stweil commented Jan 9, 2022

The newer 4.x versions don't work correctly with text found on music scores (problems with isolated lyric syllables, problems with text bounds).

@hbitteur, what about the newer Tesseract 3.05.01, does it work? Is there an example which shows the problems with newer versions? Even the latest version Tesseract 5.0.1 should still be compatible to Tesseract 3, so I wonder what goes wrong there.

Building Audiveris with Tesseract 3.05.01 or 4.0.0 works. It only requires updates for jcpp and Leptonica, too.

@stweil
Copy link
Contributor

stweil commented Jan 9, 2022

I continue my discussion at the related issue #273.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants