Error on running pypdfocr #61

ediwill · 2017-03-12T20:52:23Z

I get the following error output while running in windows 32 bit

`
C:\OCRdir>pypdfocr test1.pdf
Starting conversion of test1.pdf
'pdfimages' is not recognized as an internal or external command,
operable program or batch file.
WARNING: Could not execute pdfimages to calculate DPI (try installing xpdf or po
ppler?), so defaulting to 300dpi
Traceback (most recent call last):
File "C:\Python27\Scripts\pypdfocr-script.py", line 11, in
load_entry_point('pypdfocr==0.9.1', 'console_scripts', 'pypdfocr')()
File "C:\Python27\lib\site-packages\pypdfocr\pypdfocr.py", line 492, in main
script.go(sys.argv[1:])
File "C:\Python27\lib\site-packages\pypdfocr\pypdfocr.py", line 474, in go
self._convert_and_file_email(self.pdf_filename)
File "C:\Python27\lib\site-packages\pypdfocr\pypdfocr.py", line 480, in _conve
rt_and_file_email
ocr_pdffilename = self.run_conversion(pdf_filename)
File "C:\Python27\lib\site-packages\pypdfocr\pypdfocr.py", line 359, in run_co
nversion
hocr_filenames = self.ts.make_hocr_from_pnms(preprocess_imagefilenames)
File "C:\Python27\lib\site-packages\pypdfocr\pypdfocr_tesseract.py", line 132,
in make_hocr_from_pnms
uptodate,ver = self._is_version_uptodate()
File "C:\Python27\lib\site-packages\pypdfocr\pypdfocr_tesseract.py", line 98,
in _is_version_uptodate
ver = [int(x) for x in ver_str.split('.')]
ValueError: invalid literal for int() with base 10: '00alpha'

C:\OCRdir>

`

n828408 · 2017-03-29T10:27:32Z

I have the same error.

'pdfimages' is not recognized as an internal or external command,
operable program or batch file.
WARNING: Could not execute pdfimages to calculate DPI (try installing xpdf or poppler?), so defaulting to 300dpi
Traceback (most recent call last):
File "c:\python27\lib\runpy.py", line 174, in _run_module_as_main
"main", fname, loader, pkg_name)
File "c:\python27\lib\runpy.py", line 72, in run_code
exec code in run_globals
File "C:\Python27\Scripts\pypdfocr.exe_main.py", line 9, in
File "c:\python27\lib\site-packages\pypdfocr\pypdfocr.py", line 492, in main
script.go(sys.argv[1:])
File "c:\python27\lib\site-packages\pypdfocr\pypdfocr.py", line 474, in go
self._convert_and_file_email(self.pdf_filename)
File "c:\python27\lib\site-packages\pypdfocr\pypdfocr.py", line 480, in _convert_and_file_email
ocr_pdffilename = self.run_conversion(pdf_filename)
File "c:\python27\lib\site-packages\pypdfocr\pypdfocr.py", line 359, in run_conversion
hocr_filenames = self.ts.make_hocr_from_pnms(preprocess_imagefilenames)
File "c:\python27\lib\site-packages\pypdfocr\pypdfocr_tesseract.py", line 132, in make_hocr_from_pnms
uptodate,ver = self._is_version_uptodate()
File "c:\python27\lib\site-packages\pypdfocr\pypdfocr_tesseract.py", line 98, in _is_version_uptodate
ver = [int(x) for x in ver_str.split('.')]
ValueError: invalid literal for int() with base 10: '00alpha'

AbhishekTanksali · 2017-05-23T05:47:49Z

I am facing exactly the same issue.

Here is the error. Kindly suggest the resolution as early as possible:)

Starting conversion of sample-pio-card-application.pdf
'pdfimages' is not recognized as an internal or external command,
operable program or batch file.
WARNING: Could not execute pdfimages to calculate DPI (try installing xpdf or poppler?), so defaulting to 300dpi
Traceback (most recent call last):
File "c:\users\abhishek.tanksali\appdata\local\continuum\anaconda2\lib\runpy.py", line 174, in _run_module_as_main
"main", fname, loader, pkg_name)
File "c:\users\abhishek.tanksali\appdata\local\continuum\anaconda2\lib\runpy.py", line 72, in run_code
exec code in run_globals
File "C:\Users\abhishek.tanksali\AppData\Local\Continuum\Anaconda2\Scripts\pypdfocr.exe_main.py", line 9, in
File "c:\users\abhishek.tanksali\appdata\local\continuum\anaconda2\lib\site-packages\pypdfocr\pypdfocr.py", line 492, in main
script.go(sys.argv[1:])
File "c:\users\abhishek.tanksali\appdata\local\continuum\anaconda2\lib\site-packages\pypdfocr\pypdfocr.py", line 474, in go
self._convert_and_file_email(self.pdf_filename)
File "c:\users\abhishek.tanksali\appdata\local\continuum\anaconda2\lib\site-packages\pypdfocr\pypdfocr.py", line 480, in _convert_and_file_email
ocr_pdffilename = self.run_conversion(pdf_filename)
File "c:\users\abhishek.tanksali\appdata\local\continuum\anaconda2\lib\site-packages\pypdfocr\pypdfocr.py", line 359, in run_conversion
hocr_filenames = self.ts.make_hocr_from_pnms(preprocess_imagefilenames)
File "c:\users\abhishek.tanksali\appdata\local\continuum\anaconda2\lib\site-packages\pypdfocr\pypdfocr_tesseract.py", line 132, in make_hocr_from_pnms
uptodate,ver = self._is_version_uptodate()
File "c:\users\abhishek.tanksali\appdata\local\continuum\anaconda2\lib\site-packages\pypdfocr\pypdfocr_tesseract.py", line 98, in _is_version_uptodate
ver = [int(x) for x in ver_str.split('.')]
ValueError: invalid literal for int() with base 10: '00alpha'

farrukhqayum · 2017-07-16T14:14:53Z

Did any one find any solution for this problem?

skertbot · 2017-08-14T09:52:09Z

Try downgrading Tesseract from version 4.00.00alpha to version 3.05.01. The version check built into the pypdfocr package is expecting the version numbers to be integers, hence the error on '00alpha' (note: the version checker does account for 'dev' versions, just not versions ending with strings other than 'dev').

swoldetsadick · 2018-12-07T12:47:00Z

ver = [int(x) for x in ver_str.split('.') if x.isdigit()]
req = [int(x) for x in self.required.split('.') if x.isdigit()]
This is how it should be...

kaumaron mentioned this issue Jul 9, 2019

pypdfocr error [ValueError: invalid literal for int() with base 10: ''] in '\pypdfocr_pdf", line 98, in overlay_hocr' #9

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Error on running pypdfocr #61

Error on running pypdfocr #61

ediwill commented Mar 12, 2017 •

edited

Loading

n828408 commented Mar 29, 2017

AbhishekTanksali commented May 23, 2017

farrukhqayum commented Jul 16, 2017

skertbot commented Aug 14, 2017

swoldetsadick commented Dec 7, 2018

Error on running pypdfocr #61

Error on running pypdfocr #61

Comments

ediwill commented Mar 12, 2017 • edited Loading

n828408 commented Mar 29, 2017

AbhishekTanksali commented May 23, 2017

farrukhqayum commented Jul 16, 2017

skertbot commented Aug 14, 2017

swoldetsadick commented Dec 7, 2018

ediwill commented Mar 12, 2017 •

edited

Loading