This branch is 147 commits behind tesseract-ocr/tessdoc:main.

Name	Name	Last commit message	Last commit date
Latest commit amitdo Update ReleaseNotes.md Nov 30, 2022 90f6ee1 · Nov 30, 2022 History 1,953 Commits
examples	examples	Update OpenCV_example.cc	Jun 26, 2022
images	images	Add example Use pdftotext for preserving layout for text output	Mar 21, 2022
tess3	tess3	Update Training-Tesseract-3.03–3.05.md	Aug 22, 2022
tess4	tess4	Fix link to ImproveQuality page	May 17, 2022
tess5	tess5	Update TrainingTesseract-5.md	Aug 26, 2022
404.html	404.html	404	Nov 11, 2020
APIExample-user_patterns.md	APIExample-user_patterns.md	mass replacement of "master" branch with "main"	Oct 28, 2021
APIExample.md	APIExample.md	fix some outdated links	Oct 28, 2021
AddOns.md	AddOns.md	Update AddOns.md	Jun 15, 2022
Benchmarks.md	Benchmarks.md	add info about test of 4.1.3 with axw support	Feb 6, 2022
Command-Line-Usage.md	Command-Line-Usage.md	Fix formatting	Mar 21, 2022
Common-Errors-and-Resolutions.md	Common-Errors-and-Resolutions.md	Move Tesseract4 related files to tess4 directory	Mar 23, 2021
Compiling-–-GitInstallation.md	Compiling-–-GitInstallation.md	Replace Wiki links by local references	Feb 6, 2020
Compiling.md	Compiling.md	[sw] Fix tess version.	Nov 18, 2022
Data-Files-Contributions.md	Data-Files-Contributions.md	Add link to finetuned Fraktur traineddata	Jan 28, 2021
Data-Files-in-different-versions.md	Data-Files-in-different-versions.md	Reduce columns width to fit table in themed pages	Dec 10, 2020
Data-Files-in-tessdata_best.md	Data-Files-in-tessdata_best.md	Update info about traineddata files and their network spec	Nov 13, 2020
Data-Files-in-tessdata_fast.md	Data-Files-in-tessdata_fast.md	Update info about traineddata files and their network spec	Nov 13, 2020
Data-Files.md	Data-Files.md	Update info about traineddata files and their network spec	Nov 13, 2020
Docker-Containers.md	Docker-Containers.md	Move Tesseract4 related files to tess4 directory	Mar 23, 2021
Documentation.md	Documentation.md	Documentation: master -> main	Dec 27, 2021
Downloads.md	Downloads.md	Update links for Windows binaries	Nov 12, 2020
Examples_C++.md	Examples_C++.md	Rename as Example_C++	Jan 7, 2021
FAQ.md	FAQ.md	Fix typos and formatting of some items	Aug 23, 2022
Fonts.md	Fonts.md	fix some outdated links	Oct 28, 2021
Home.md	Home.md	Home.md = README.md	Nov 15, 2020
ImproveQuality.md	ImproveQuality.md	Update ImproveQuality.md	Dec 26, 2021
InputFormats.md	InputFormats.md	Update InputFormats.md	Aug 26, 2021
Installation.md	Installation.md	Improve Installation.md	Mar 22, 2022
InstallationOpenSuse.md	InstallationOpenSuse.md	Fix formatting	Mar 22, 2022
OldVersionDocs.md	OldVersionDocs.md	Add Manual pages for additional LSTM training tools	Feb 3, 2022
Planning.md	Planning.md	Replace Wiki links by local references	Feb 6, 2020
README.md	README.md	Fix little typo	Aug 22, 2022
ReleaseNotes.md	ReleaseNotes.md	Update ReleaseNotes.md	Nov 30, 2022
ScrollView.jar	ScrollView.jar	add ScrollView.jar to wiki	Dec 24, 2018
TesseractOpenCL.md	TesseractOpenCL.md	Updated TesseractOpenCL (markdown)	Mar 16, 2018
TestingTesseract.md	TestingTesseract.md	Update page regarding testing	Mar 23, 2022
UNLV-Testing-of-Tesseract.md	UNLV-Testing-of-Tesseract.md	fix more master vs main vs correct tag	Oct 28, 2021
User-App-Example.md	User-App-Example.md	Update User-App-Example.md	Feb 21, 2020
User-Projects-–-3rdParty.md	User-Projects-–-3rdParty.md	Update User-Projects-–-3rdParty.md	Nov 21, 2022
ViewerDebugging.md	ViewerDebugging.md	Move images to a subdirectory and change links for them	Nov 9, 2020
_config.yml	_config.yml	Set theme jekyll-theme-cayman	Jan 30, 2020

Repository files navigation

Tesseract User Manual

This user manual is for Tesseract versions 5.x. For versions 4.x.x, 3.05.02 and older, see the documentation for old versions.

Tesseract User Manual

Introduction

Tesseract is an open source text recognition (OCR) Engine, available under the Apache 2.0 license.

Major version 5 is the current stable version and started with release 5.0.0 on November 30, 2021.
Newer minor versions and bugfix versions are available from GitHub.
Latest source code is available from main branch on GitHub. Open issues can be found in issue tracker, and planning documentation.

Tesseract can be used directly via command line, or (for programmers) by using an API to extract printed text from images. It supports a wide variety of languages. Tesseract doesn't have a built-in GUI, but there are several available from the 3rdParty page. External tools, wrappers and training projects for Tesseract are listed under AddOns.

Tesseract can be used in your own project, under the terms of the Apache License 2.0. It has a fully featured API, and can be compiled for a variety of targets including Android and the iPhone. See the 3rdParty and AddOns pages for samples of what has been done with it.

If you have a question, first read the documentation, particularly the FAQ to see if your problem is addressed there. If not, search the Issues List, Tesseract user forum, and if you still can't find what you need, please ask your question in Tesseract user forum Google group.

Tesseract is free software, so if you want to pitch in and help, please do! If you find a bug and fix it yourself, the best thing to do is to attach the patch to your bug report in the Issues List.

Releases and Changelog

Tesseract with LSTM

Tesseract 4.0 added a new OCR engine based on LSTM neural networks. It works well on x86/Linux with official Language Model data available for 100+ languages and 35+ scripts. See 4.0x-Changelog for more details.

5.x.x

Source Code

Tesseract 5.x.x source code is available in the main branch of the repository. The main branch is using 5.0.0 semver versioning because C++ code modernization caused API incompatibility with 4.x release.

Binaries

Binaries are available from:

Traineddata Files

For detailed information about the different types of models, see Data Files.

Model files for version 4.00 are available from tessdata tagged 4.00. It has models from November 2016. The individual language file links are available from the following link.

tessdata 4.00 November 2016

Model files for version 4.0.0 and later are available from tessdata tagged 4.0.0. It has legacy models from September 2017 that have been updated with Integer versions of tessdata_best LSTM models. This set of traineddata files has support for both the legacy recognizer with --oem 0 and for LSTM models with --oem 1. These models are available from the following Github repo.

tessdata

Two more sets of official traineddata, trained at Google, are made available in the following Github repos. These do not have the legacy models and only have LSTM models usable with --oem 1.

Language model traineddata files same as listed above for version 4.0.0 can be used with Tesseract 5.x.x. These are available from:

Compiling and Installation

Usage

API Examples

Technical Information

Historical Technical Documentation
API/ABI changes review for Tesseract
Manual Pages
Source Documentation generated by Doxygen
Neural Nets in Tesseract
VGSL Specs
VGSL Specs info from Tensorflow
Network spec for tessdata_fast models
Network spec for tessdata_best models
DAS 2016 tutorial slides Slides #2, #6, #7 have information about LSTM integration in Tesseract 4.0x.
Tesseract OpenCL - Experimental

Training for Tesseract 5

Training with tesstrain.sh (a.k.a tesseract 4 training) in unsupported/abandoned. Please use scripts from https://github.com/tesseract-ocr/tesstrain for training.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Tesseract User Manual

Introduction

Releases and Changelog

Tesseract with LSTM

5.x.x

Source Code

Binaries

Traineddata Files

Compiling and Installation

Usage

API Examples

Technical Information

Training for Tesseract 5

Testing

External Projects

User Manual for Old Versions

About

Releases

Packages

Languages

qoppasoftwareatl/tessdoc

Folders and files

Latest commit

History

Repository files navigation

Tesseract User Manual

Introduction

Releases and Changelog

Tesseract with LSTM

5.x.x

Source Code

Binaries

Traineddata Files

Compiling and Installation

Usage

API Examples

Technical Information

Training for Tesseract 5

Testing

External Projects

User Manual for Old Versions

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages