Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error on running pypdfocr #61

Open
ediwill opened this issue Mar 12, 2017 · 5 comments
Open

Error on running pypdfocr #61

ediwill opened this issue Mar 12, 2017 · 5 comments

Comments

@ediwill
Copy link

ediwill commented Mar 12, 2017

I get the following error output while running in windows 32 bit

`
C:\OCRdir>pypdfocr test1.pdf
Starting conversion of test1.pdf
'pdfimages' is not recognized as an internal or external command,
operable program or batch file.
WARNING: Could not execute pdfimages to calculate DPI (try installing xpdf or po
ppler?), so defaulting to 300dpi
Traceback (most recent call last):
File "C:\Python27\Scripts\pypdfocr-script.py", line 11, in
load_entry_point('pypdfocr==0.9.1', 'console_scripts', 'pypdfocr')()
File "C:\Python27\lib\site-packages\pypdfocr\pypdfocr.py", line 492, in main
script.go(sys.argv[1:])
File "C:\Python27\lib\site-packages\pypdfocr\pypdfocr.py", line 474, in go
self._convert_and_file_email(self.pdf_filename)
File "C:\Python27\lib\site-packages\pypdfocr\pypdfocr.py", line 480, in _conve
rt_and_file_email
ocr_pdffilename = self.run_conversion(pdf_filename)
File "C:\Python27\lib\site-packages\pypdfocr\pypdfocr.py", line 359, in run_co
nversion
hocr_filenames = self.ts.make_hocr_from_pnms(preprocess_imagefilenames)
File "C:\Python27\lib\site-packages\pypdfocr\pypdfocr_tesseract.py", line 132,
in make_hocr_from_pnms
uptodate,ver = self._is_version_uptodate()
File "C:\Python27\lib\site-packages\pypdfocr\pypdfocr_tesseract.py", line 98,
in _is_version_uptodate
ver = [int(x) for x in ver_str.split('.')]
ValueError: invalid literal for int() with base 10: '00alpha'

C:\OCRdir>

`

@n828408
Copy link

n828408 commented Mar 29, 2017

I have the same error.

'pdfimages' is not recognized as an internal or external command,
operable program or batch file.
WARNING: Could not execute pdfimages to calculate DPI (try installing xpdf or poppler?), so defaulting to 300dpi
Traceback (most recent call last):
File "c:\python27\lib\runpy.py", line 174, in _run_module_as_main
"main", fname, loader, pkg_name)
File "c:\python27\lib\runpy.py", line 72, in run_code
exec code in run_globals
File "C:\Python27\Scripts\pypdfocr.exe_main
.py", line 9, in
File "c:\python27\lib\site-packages\pypdfocr\pypdfocr.py", line 492, in main
script.go(sys.argv[1:])
File "c:\python27\lib\site-packages\pypdfocr\pypdfocr.py", line 474, in go
self._convert_and_file_email(self.pdf_filename)
File "c:\python27\lib\site-packages\pypdfocr\pypdfocr.py", line 480, in _convert_and_file_email
ocr_pdffilename = self.run_conversion(pdf_filename)
File "c:\python27\lib\site-packages\pypdfocr\pypdfocr.py", line 359, in run_conversion
hocr_filenames = self.ts.make_hocr_from_pnms(preprocess_imagefilenames)
File "c:\python27\lib\site-packages\pypdfocr\pypdfocr_tesseract.py", line 132, in make_hocr_from_pnms
uptodate,ver = self._is_version_uptodate()
File "c:\python27\lib\site-packages\pypdfocr\pypdfocr_tesseract.py", line 98, in _is_version_uptodate
ver = [int(x) for x in ver_str.split('.')]
ValueError: invalid literal for int() with base 10: '00alpha'

@AbhishekTanksali
Copy link

I am facing exactly the same issue.

Here is the error. Kindly suggest the resolution as early as possible:)

Starting conversion of sample-pio-card-application.pdf
'pdfimages' is not recognized as an internal or external command,
operable program or batch file.
WARNING: Could not execute pdfimages to calculate DPI (try installing xpdf or poppler?), so defaulting to 300dpi
Traceback (most recent call last):
File "c:\users\abhishek.tanksali\appdata\local\continuum\anaconda2\lib\runpy.py", line 174, in _run_module_as_main
"main", fname, loader, pkg_name)
File "c:\users\abhishek.tanksali\appdata\local\continuum\anaconda2\lib\runpy.py", line 72, in run_code
exec code in run_globals
File "C:\Users\abhishek.tanksali\AppData\Local\Continuum\Anaconda2\Scripts\pypdfocr.exe_main
.py", line 9, in
File "c:\users\abhishek.tanksali\appdata\local\continuum\anaconda2\lib\site-packages\pypdfocr\pypdfocr.py", line 492, in main
script.go(sys.argv[1:])
File "c:\users\abhishek.tanksali\appdata\local\continuum\anaconda2\lib\site-packages\pypdfocr\pypdfocr.py", line 474, in go
self._convert_and_file_email(self.pdf_filename)
File "c:\users\abhishek.tanksali\appdata\local\continuum\anaconda2\lib\site-packages\pypdfocr\pypdfocr.py", line 480, in _convert_and_file_email
ocr_pdffilename = self.run_conversion(pdf_filename)
File "c:\users\abhishek.tanksali\appdata\local\continuum\anaconda2\lib\site-packages\pypdfocr\pypdfocr.py", line 359, in run_conversion
hocr_filenames = self.ts.make_hocr_from_pnms(preprocess_imagefilenames)
File "c:\users\abhishek.tanksali\appdata\local\continuum\anaconda2\lib\site-packages\pypdfocr\pypdfocr_tesseract.py", line 132, in make_hocr_from_pnms
uptodate,ver = self._is_version_uptodate()
File "c:\users\abhishek.tanksali\appdata\local\continuum\anaconda2\lib\site-packages\pypdfocr\pypdfocr_tesseract.py", line 98, in _is_version_uptodate
ver = [int(x) for x in ver_str.split('.')]
ValueError: invalid literal for int() with base 10: '00alpha'

@farrukhqayum
Copy link

Did any one find any solution for this problem?

@skertbot
Copy link

Try downgrading Tesseract from version 4.00.00alpha to version 3.05.01. The version check built into the pypdfocr package is expecting the version numbers to be integers, hence the error on '00alpha' (note: the version checker does account for 'dev' versions, just not versions ending with strings other than 'dev').

@swoldetsadick
Copy link

ver = [int(x) for x in ver_str.split('.') if x.isdigit()]
req = [int(x) for x in self.required.split('.') if x.isdigit()]
This is how it should be...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants