Workflow Guide recommendations

In order to facilitate the usage of OCR-D and the configuration of workflows, we provide two workflows which can be used as a start for your OCR-D-tests. They were determined by testing the processors listed above on selected pages of some prints from the 17th and 18th century.

The results vary quite a lot from page to page. In most cases, segmentation is a problem.

Note that for our test pages, not all steps described above werde needed to obtain the best results. Depending on your particular images, you might want to include those processors again for better results.

We are currently working on regression tests with the help of which we will be able to provide more profound workflows soon, which will replace those interim solutions.

Minimal workflow

Since ocrd-tesserocr-recognize can do binarization (Otsu), region segmentation, table recognition, line segmentation and text recognition at once, just like the upstream tesseract command line tool, it's a good single-step workflow to get a baseline result to compare to granular workflows.

Note: Be aware that you will most likely obtain significantly better results by configuring a more granular workflow like e.g. the workflows below.

Step	Processor	Parameter
1	ocrd-tesserocr-recognize	-P segmentation_level region -P textequiv_level word -P find_tables true -P model frak2021

Example with ocrd-process

ocrd process "tesserocr-recognize -P segmentation_level region -P textequiv_level word -P find_tables true -P model frak2021"

Welcome to the OCR-D wiki, a companion to the OCR-D website.

Articles and tutorials

Discussions

Expert section on OCR-D- workflows

Particular workflow steps

Recommended workflows

Successful Workflows for Particular Material (Template)

Workflow Guide

Videos

Section on Ground Truth

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Workflow Guide recommendations

Minimal workflow

Example with ocrd-process

Clone this wiki locally