Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

🎁 Complete WordCoordinatesGenerator #6

Closed
Tracked by #421
jeremyf opened this issue May 8, 2023 · 0 comments · Fixed by #34
Closed
Tracked by #421

🎁 Complete WordCoordinatesGenerator #6

jeremyf opened this issue May 8, 2023 · 0 comments · Fixed by #34
Assignees

Comments

@jeremyf
Copy link
Contributor

jeremyf commented May 8, 2023

This builds from the IiifPrint::TextExtraction::AltoReader#text logic. It assumes that we’re building from the derived hocr file. However, we could skip the hocr generator.

Depends on:

Related to:

With the coordinates done, then:

@jeremyf jeremyf changed the title Write DerivativeRodeo::Generators::TextGenerator 🎁 Write DerivativeRodeo::Generators::TextGenerator May 18, 2023
@jeremyf jeremyf changed the title 🎁 Write DerivativeRodeo::Generators::TextGenerator 🎁 Complete WordCoordinatesGenerator May 18, 2023
kirkkwang added a commit that referenced this issue May 25, 2023
This commit will ensure that the word coordinates generator will have
unique values.  This has been observed in IIIF Print's version of the
same generator where the same word coordinates appear multiple times and
results in the UV having multiple annotations for the same word at the
same place.  This also will set up the text generator for a more
Tesseract like text file where the return carriage follows the text on
the image rather than return carriage after each word.  This also will
set up the alto generator to not have duplicate word coordinates as
well.

- Resolves: #6
kirkkwang added a commit that referenced this issue May 25, 2023
This commit will ensure that the word coordinates generator will have
unique values.  This has been observed in IIIF Print's version of the
same generator where the same word coordinates appear multiple times and
results in the UV having multiple annotations for the same word at the
same place.  This also will set up the text generator for a more
Tesseract like text file where extra spaces are omitted for an overall
cleaner output.  This also will set up the alto generator to not have
duplicate word coordinates as well.

- Resolves: #6
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants