Skip to content
Stefan Weil edited this page Dec 17, 2019 · 5 revisions

Training Fraktur

This is a collection of sources for training OCR models which can be used to recognize Fraktur.

Austrian Newspapers

Austrian Newspapers is a ground truth data set created from Austrian newspapers by the Austrian National Library (Österreichische Nationalbibliothek).See https://github.com/tesseract-ocr/tesstrain/wiki/AustrianNewspapers for more information.

GT4HistOCR

GT4HistOCR is ground truth for training OCR engines on historical documents in German Fraktur and Early Modern Latin. See https://github.com/tesseract-ocr/tesstrain/wiki/GT4HistOCR for more information.

Ocropus Fraktur

https://github.com/jze/ocropus-model_fraktur provides ground truth data, 3852 lines for training and 414 lines for testing, both of good quality.

Open issues

  • Some umlauts might be replaced by aͤ, oͤ, uͤ.
  • It uses the minus sign instead of ⸗.

``

Clone this wiki locally