Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ocr调用tesseract失败 #19

Open
rourouZ opened this issue Feb 28, 2025 · 9 comments
Open

ocr调用tesseract失败 #19

rourouZ opened this issue Feb 28, 2025 · 9 comments

Comments

@rourouZ
Copy link

rourouZ commented Feb 28, 2025

您好,机器上成功安装tesseract5.0.0后,使用kreuzberg调用tesseract失败,请问需要更改什么配置吗,报错如下:

kreuzberg.exceptions.MissingDependencyError: MissingDependencyError: Tesseract is not installed or not in path. Please install tesseract 5 and above on your system.

cmd中tesseract -v正常显示版本号,且使用pytesseract,在修改pytesseract.pytesseract.tesseract_cmd 后也可以正常使用。

@Goldziher
Copy link
Owner

您好,机器上成功安装tesseract5.0.0后,使用kreuzberg调用tesseract失败,请问需要更改什么配置吗,报错如下:

kreuzberg.exceptions.MissingDependencyError: MissingDependencyError: Tesseract is not installed or not in path. Please install tesseract 5 and above on your system.

cmd中tesseract -v正常显示版本号,且使用pytesseract,在修改pytesseract.pytesseract.tesseract_cmd 后也可以正常使用。

Hi,

Can you tell me what operating system you are using ?

Also, please include the output if tesseract -v. Probably the regex I'm using doesn't include the output on your system correctly.

@rourouZ
Copy link
Author

rourouZ commented Feb 28, 2025

``> > 您好,机器上成功安装tesseract5.0.0后,使用kreuzberg调用tesseract失败,请问需要更改什么配置吗,报错如下:

kreuzberg.exceptions.MissingDependencyError: MissingDependencyError: Tesseract is not installed or not in path. Please install tesseract 5 and above on your system.
cmd中tesseract -v正常显示版本号,且使用pytesseract,在修改pytesseract.pytesseract.tesseract_cmd 后也可以正常使用。

Hi,

Can you tell me what operating system you are using ?

Also, please include the output if tesseract -v. Probably the regex I'm using doesn't include the output on your system correctly.

您好,机器上成功安装tesseract5.0.0后,使用kreuzberg调用tesseract失败,请问需要更改什么配置吗,报错如下:
kreuzberg.exceptions.MissingDependencyError: MissingDependencyError: Tesseract is not installed or not in path. Please install tesseract 5 and above on your system.
cmd中tesseract -v正常显示版本号,且使用pytesseract,在修改pytesseract.pytesseract.tesseract_cmd 后也可以正常使用。

Hi,

Can you tell me what operating system you are using ?

Also, please include the output if tesseract -v. Probably the regex I'm using doesn't include the output on your system correctly.

windows10系统,tesseract安装在D盘下,并配置了环境变量。
C:\Users\zaa>tesseract -v tesseract v5.0.0.20211201 leptonica-1.78.0 libgif 5.1.4 : libjpeg 8d (libjpeg-turbo 1.5.3) : libpng 1.6.34 : libtiff 4.0.9 : zlib 1.2.11 : libwebp 0.6.1 : libopenjp2 2.3.0 Found AVX2 Found AVX Found FMA Found SSE4.1 Found libarchive 3.5.0 zlib/1.2.11 liblzma/5.2.3 bz2lib/1.0.6 liblz4/1.7.5 libzstd/1.4.5 Found libcurl/7.77.0-DEV Schannel zlib/1.2.11 zstd/1.4.5 libidn2/2.0.4 nghttp2/1.31.0

@Goldziher
Copy link
Owner

Goldziher commented Feb 28, 2025

Thanks, I'll look into it ASAP.

If you like to explore and submit a fix - you can modify the regex in kreuzberg._tesseract in your local system and see if you manage to fix it.

If you do, submit a PR with the fix!

@Cycloctane
Copy link
Contributor

cmd中tesseract -v正常显示版本号,且使用pytesseract,在修改pytesseract.pytesseract.tesseract_cmd 后也可以正常使用。

你把pytesseract.pytesseract.tesseract_cmd修改成什么时可以正常使用

@Goldziher
Copy link
Owner

cmd中tesseract -v正常显示版本号,且使用pytesseract,在修改pytesseract.pytesseract.tesseract_cmd 后也可以正常使用。

你把pytesseract.pytesseract.tesseract_cmd修改成什么时可以正常使用

Im trying to understand what you write by translating it, but its not optimal :)

The library doesnt use pytesseract at all. See kreuzberg._tesseract.py

@Cycloctane
Copy link
Contributor

Cycloctane commented Feb 28, 2025 via email

@rourouZ
Copy link
Author

rourouZ commented Feb 28, 2025

Thanks, I'll look into it ASAP.

If you like to explore and submit a fix - you can modify the regex in kreuzberg._tesseract in your local system and see if you manage to fix it.

If you do, submit a PR with the fix!

cmd中tesseract -v正常显示版本号,且使用pytesseract,在修改pytesseract.pytesseract.tesseract_cmd 后也可以正常使用。

你把pytesseract.pytesseract.tesseract_cmd修改成什么时可以正常使用

Yes I understand that. He said he also tried pytesseract and was able to use it after manually modifying pytesseract.pytesseract.tesseract_cmd. I guess this may be an issue with his tesseract executable name or PATH env. So I'm asking the tesseract_cmd value that he was using.

修改成tesseract的安装路径
环境变量中也做了配置

@Goldziher
Copy link
Owner

Yes I understand that. He said he also tried pytesseract and was able to use it after manually modifying pytesseract.pytesseract.tesseract_cmd. I guess this may be an issue with his tesseract executable name or PATH env. So I'm asking the tesseract_cmd value that he was using.


发件人: Goldziher @.>
发送时间: 星期五, 二月 28, 2025 3:57:38 下午
收件人: Goldziher/kreuzberg @.
>
抄送: Cycloctane @.>; Comment @.>
主题: Re: [Goldziher/kreuzberg] ocr调用tesseract失败 (Issue #19)

cmd中tesseract -v正常显示版本号,且使用pytesseract,在修改pytesseract.pytesseract.tesseract_cmd 后也可以正常使用。

你把pytesseract.pytesseract.tesseract_cmd修改成什么时可以正常使用

Im trying to understand what you write by translating it, but its not optimal :)

The library doesnt use pytesseract at all. See kreuzberg._tesseract.py


Reply to this email directly, view it on GitHub#19 (comment), or unsubscribehttps://github.com/notifications/unsubscribe-auth/A2QKPMVSSWBGZPXKE4VBTOT2SAJGXAVCNFSM6AAAAABYBGNZJ6VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDMOBZHE3TGMZTG4.
You are receiving this because you commented.Message ID: @.***>

[Goldziher]Goldziher left a comment (#19)#19 (comment)

cmd中tesseract -v正常显示版本号,且使用pytesseract,在修改pytesseract.pytesseract.tesseract_cmd 后也可以正常使用。

你把pytesseract.pytesseract.tesseract_cmd修改成什么时可以正常使用

Im trying to understand what you write by translating it, but its not optimal :)

The library doesnt use pytesseract at all. See kreuzberg._tesseract.py


Reply to this email directly, view it on GitHub#19 (comment), or unsubscribehttps://github.com/notifications/unsubscribe-auth/A2QKPMVSSWBGZPXKE4VBTOT2SAJGXAVCNFSM6AAAAABYBGNZJ6VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDMOBZHE3TGMZTG4.
You are receiving this because you commented.Message ID: @.***>

Oh thanks 🙏.

@Cycloctane
Copy link
Contributor

修改成tesseract的安装路径 环境变量中也做了配置

如果pytesseract需要用绝对路径调用的话说明还是可能是环境变量配置的问题。能否在python中运行确认一下

import os
print(os.environ['PATH'])

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants