Skip to content

Latest commit

 

History

History
104 lines (72 loc) · 4.25 KB

Docker-Hub-Description.md

File metadata and controls

104 lines (72 loc) · 4.25 KB

GitHub Issues Docker Docker

Tesseract OCR 📜

Docker Image with latest Tesseract OCR Version 5.x.x built from sources.

The sources are pulled from the latest main branch and latest releases of the Tesseract OCR project.

GitHub Repository: https://github.com/Franky1/Tesseract-OCR-5-Docker

Tags 🏷️

latest : Whenever there is a change in the sources, the main branch is pulled and the image is rebuilt. Changes are checked on a daily basis.

5.x.x : Whenever there is a new release of Tesseract OCR 5.x.x., the sources from this release are pulled and the image is built and tagged accordingly. Checking for new releases is done on a daily basis.

Usage 🛠️

Pull Docker Image

Pull the docker image from Docker Hub:

docker pull franky1/tesseract

Run Docker Container

see GitHub Repository for better understanding of the steps below

Mount your image data to the /tmp directory and run Tesseract OCR container with the required command line options, for example, run Tesseract OCR container with test image:

docker run -it -v ${PWD}/testdata:/tmp --rm franky1/tesseract \
  tesseract english.png output --oem 1 -l eng

For the Tesseract command line options, please refer to the Tesseract Manual

Mount more languages 🗣️

see GitHub Repository for better understanding of the steps below

Test if the mounted languages from your local subfolder /tessdata are available in the Docker container. Be aware that the local languages overwrite the installed languages in the Docker image. Example here with french language:

docker run -it -v ${PWD}/testdata:/tmp \
  -v ${PWD}/tessdata:/usr/local/share/tessdata/ \
  --rm franky1/tesseract

Test the mounted languages in the Docker container with a sample image. Example here with french language:

docker run -it -v ${PWD}/testdata:/tmp \
  -v ${PWD}/tessdata:/usr/local/share/tessdata/ \
  --rm franky1/tesseract \
  tesseract french.jpg output --oem 1 -l fra

Image conditions

  • Only supported target for this docker image currently is linux/amd64.
  • Working directory for ocr images is /tmp inside the container. See example above.
  • Directory for trained data is /usr/local/share/tessdata/ inside the container. See example above.
  • This image was built without the Tesseract training tools.
  • This image currently includes only the following languages from tessdata_best repository:
    • English: tessdata_best > eng.traineddata
    • German: tessdata_best > deu.traineddata
    • If you need other languages, you have to build your own image or mount trained data to the /usr/local/share/tessdata/ directory. See example above.

Tesseract Trained Data for all available languages

Further documentation 🔗

Issues 🐛

If you have any bugs or requests regarding this Docker image, please post an issue in the Github Repository: https://github.com/Franky1/Tesseract-OCR-5-Docker

Project status ✔️

22.03.2022: Docker Image is ready for usage, still some slight improvements possible (see GitHub Repo)