You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The Text Granularity Extension reference can be found here. Current supported values:
Page: A page in a paginated document
Block: An arbitrary region of text
Paragraph: A paragraph
Line: A topographic line
Word: A single word
Glyph: A single glyph or symbol
Looking at the documentation of the extension, I wonder if it would be useful to update the text granularity extension to include more values which don't currently exist, from common AI layout analysis APIs? I will list the return values for Azure and AWS below that aren't covered by the extension already.
Tables: Tabular content identified and extracted from the document. Tables relate to tables identified by the pretrained layout model. Content labeled as tables is extracted as structured fields in the documents object.
Figures: Figures (charts, images) identified and extracted from the document, providing visual representations that aid in the understanding of complex information.
Sections: Hierarchical document structure identified and extracted from the document. Section or subsection with the corresponding elements (paragraph, table, figure) attached to it.
I think adding a value to indicate a section start would be most valuable, but I think all of the AWS/tesseract values are useful.
AI models are becoming more and more commonly used for these types of tasks which might support the updating of the extension.
The text was updated successfully, but these errors were encountered:
The Text Granularity Extension reference can be found here. Current supported values:
Looking at the documentation of the extension, I wonder if it would be useful to update the text granularity extension to include more values which don't currently exist, from common AI layout analysis APIs? I will list the return values for Azure and AWS below that aren't covered by the extension already.
AWS
Azure
I think adding a value to indicate a section start would be most valuable, but I think all of the AWS/tesseract values are useful.
AI models are becoming more and more commonly used for these types of tasks which might support the updating of the extension.
The text was updated successfully, but these errors were encountered: