-
Notifications
You must be signed in to change notification settings - Fork 114
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Error on running pypdfocr #61
Comments
I have the same error. 'pdfimages' is not recognized as an internal or external command, |
I am facing exactly the same issue. Here is the error. Kindly suggest the resolution as early as possible:) Starting conversion of sample-pio-card-application.pdf |
Did any one find any solution for this problem? |
Try downgrading Tesseract from version 4.00.00alpha to version 3.05.01. The version check built into the pypdfocr package is expecting the version numbers to be integers, hence the error on '00alpha' (note: the version checker does account for 'dev' versions, just not versions ending with strings other than 'dev'). |
ver = [int(x) for x in ver_str.split('.') if x.isdigit()] |
I get the following error output while running in windows 32 bit
`
C:\OCRdir>pypdfocr test1.pdf
Starting conversion of test1.pdf
'pdfimages' is not recognized as an internal or external command,
operable program or batch file.
WARNING: Could not execute pdfimages to calculate DPI (try installing xpdf or po
ppler?), so defaulting to 300dpi
Traceback (most recent call last):
File "C:\Python27\Scripts\pypdfocr-script.py", line 11, in
load_entry_point('pypdfocr==0.9.1', 'console_scripts', 'pypdfocr')()
File "C:\Python27\lib\site-packages\pypdfocr\pypdfocr.py", line 492, in main
script.go(sys.argv[1:])
File "C:\Python27\lib\site-packages\pypdfocr\pypdfocr.py", line 474, in go
self._convert_and_file_email(self.pdf_filename)
File "C:\Python27\lib\site-packages\pypdfocr\pypdfocr.py", line 480, in _conve
rt_and_file_email
ocr_pdffilename = self.run_conversion(pdf_filename)
File "C:\Python27\lib\site-packages\pypdfocr\pypdfocr.py", line 359, in run_co
nversion
hocr_filenames = self.ts.make_hocr_from_pnms(preprocess_imagefilenames)
File "C:\Python27\lib\site-packages\pypdfocr\pypdfocr_tesseract.py", line 132,
in make_hocr_from_pnms
uptodate,ver = self._is_version_uptodate()
File "C:\Python27\lib\site-packages\pypdfocr\pypdfocr_tesseract.py", line 98,
in _is_version_uptodate
ver = [int(x) for x in ver_str.split('.')]
ValueError: invalid literal for int() with base 10: '00alpha'
C:\OCRdir>
`
The text was updated successfully, but these errors were encountered: