OCR project that can fetch bill amount from a PDF invoice
pip install streamlit, pytesseract, pdf2image, opencv-python
Other than these, you also required some pytesseract files and poppler files:
Install it from here: https://github.com/UB-Mannheim/tesseract/wiki
Then replace the pyts_path variable with the path of "pytesseract.exe"
For example, it was this for me:
pyts_path = "C:\Program Files\Tesseract-OCR\Tesseract.exe"
Download it from here: https://blog.alivate.com.au/poppler-windows/ and extract it.
Then replace the poppler_path variable with the path of poppler's bin folder
For example, it was this for me:
poppler_path = r"C:\Program Files\poppler-0.68.0\bin"
Open powershell window in the same folder where the source-code is present.
Run the following streamlit app using the following command:
streamlit run .\OCR_streamlit.py
Following window will open up:
Now, simply upload an invoice PDF and it will return the bill amount