Translation use case? #12060

Tejaswgupta · 2024-05-06T18:03:05Z

Tejaswgupta
May 6, 2024

We've fine-tuned a transformers based translation engine which works on paragraph to ensure contextual translation. We had been using tesseract/HOCR for paragraph level extraction , but the HOCR library we used is obsolete now. PP-OCR seems a promising solution but I couldn't any resources on paragraph level extraction.

Can someone shed some light on this. Thanks!

GreatV · 2024-05-25T05:35:35Z

GreatV
May 25, 2024
Maintainer

please refer: https://github.com/PaddlePaddle/PaddleOCR/blob/main/README_en.md#-tutorials

0 replies

Tejaswgupta · 2024-05-25T06:45:16Z

Tejaswgupta
May 25, 2024
Author

@GreatV I've gone through the docs, the only relevant thing was PP-Structure but that's an overkill and would require more work to get components out of it for our use case.

0 replies

GreatV · 2024-05-25T06:55:06Z

GreatV
May 25, 2024
Maintainer

@Tejaswgupta Sorry, as far as I know paddleocr doesn't do direct paragraph level detection and recognition.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Translation use case? #12060

{{title}}

Replies: 3 comments

{{title}}

{{title}}

{{title}}

Select a reply

Translation use case? #12060

Tejaswgupta May 6, 2024

Replies: 3 comments

GreatV May 25, 2024 Maintainer

Tejaswgupta May 25, 2024 Author

GreatV May 25, 2024 Maintainer

Tejaswgupta
May 6, 2024

GreatV
May 25, 2024
Maintainer

Tejaswgupta
May 25, 2024
Author

GreatV
May 25, 2024
Maintainer