You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
This builds from the IiifPrint::TextExtraction::AltoReader#text logic. It assumes that we’re building from the derived hocr file. However, we could skip the hocr generator.
This commit will ensure that the word coordinates generator will have
unique values. This has been observed in IIIF Print's version of the
same generator where the same word coordinates appear multiple times and
results in the UV having multiple annotations for the same word at the
same place. This also will set up the text generator for a more
Tesseract like text file where the return carriage follows the text on
the image rather than return carriage after each word. This also will
set up the alto generator to not have duplicate word coordinates as
well.
- Resolves: #6
This commit will ensure that the word coordinates generator will have
unique values. This has been observed in IIIF Print's version of the
same generator where the same word coordinates appear multiple times and
results in the UV having multiple annotations for the same word at the
same place. This also will set up the text generator for a more
Tesseract like text file where extra spaces are omitted for an overall
cleaner output. This also will set up the alto generator to not have
duplicate word coordinates as well.
- Resolves: #6
This builds from the
IiifPrint::TextExtraction::AltoReader#text
logic. It assumes that we’re building from the derivedhocr
file. However, we could skip thehocr
generator.Depends on:
Related to:
With the coordinates done, then:
The text was updated successfully, but these errors were encountered: