Add the PARSeq model for Scene Text Recognition #2036

gowthamkpr · 2025-01-07T20:32:12Z

Adds the PARSeq OCR model for scene text recognition (STR): https://arxiv.org/abs/2207.06966

Features:

iterative autoregressive and non-autoregressive decoding and refinement, fully performed within Keras' computation graph
training based on permutation sequences, as described in the paper
ViT-based recognition model
well tested model with state-of-the-art performance

This is (so far) a 1:1 translation of PaddleOCR's implementation (Apache 2.0-licensed), in particular of

Next steps:

Port PaddleOCR weights
Adaptations to meet KerasHub style
Pre/postprocessing, and a dedicated Task for OCR
Add tests

mattdangerw · 2025-01-13T22:50:04Z

Thanks! When this is done, could you attach a colab showing basic usage? Or even just a code snippet that shows the full usage including data input formats would be helpful.

gowthamkpr · 2025-01-21T16:01:23Z

A few comments on code layout of this first version of the code:

There's a weird bug where tests for all backends complete perfectly fine when run on their own, but fail when run after other tests. This occurs because Keras rewrites the class hierarchy of Backbone as soon as another model is instantiated (see here) with __init__(inputs=..., outputs=...), therefore the expected __init__ arguments change and inputs and outputs arguments always have to be provided.
If this is a problem with a known solution, please let me know. I'll investigate further. Due to the way inference and training work with PARSeq, we require a call method here.

Document explaining where the bug happens!!!

Compared to the existing ViT implementation, the ViT implementation I've adopted from PaddleOCR contains some additional logic for Drop Path regularisation, but the pretrained weights do not appear to use this. For being able to use the pretrained weights alone, we might alternatively just use the existing ViT implementation, if that is preferred.
For now, I've implemented a simple detokenize method. Alternatively, we might add a full-fledged Tokenizer to support the model character set, either extending the existing ByteTokenizer to support a given character set, or creating a new one.

gowthamkpr added 3 commits January 7, 2025 21:00

Initial version of PARSeq

5958e2f

Fix formatting

e822076

Minor code corrections

4621760

gowthamkpr and others added 10 commits January 14, 2025 19:31

Minor model fixes

93df062

Correct spelling of PARSeq

055bddc

Correct spelling of PARSeq

7e5794e

Added __init__.py

4dab616

Merge branch 'keras-team:master' into parseq

cee46ef

Added tests, cleaned up code

d842b1b

Merge branch 'parseq' of github.com:gowthamkpr/keras-nlp into parseq

eaf0027

Added OCR task

95ed773

Fix factorial

2ac16f1

Run formatting script

603f8f8

gowthamkpr marked this pull request as ready for review January 21, 2025 16:01

gowthamkpr added 2 commits January 24, 2025 19:52

Fix detokenize

cf2c42e

Rename according to naming conventions

d981a58

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add the PARSeq model for Scene Text Recognition #2036

Add the PARSeq model for Scene Text Recognition #2036

gowthamkpr commented Jan 7, 2025 •

edited

Loading

mattdangerw commented Jan 13, 2025

gowthamkpr commented Jan 21, 2025 •

edited

Loading

Add the PARSeq model for Scene Text Recognition #2036

Are you sure you want to change the base?

Add the PARSeq model for Scene Text Recognition #2036

Conversation

gowthamkpr commented Jan 7, 2025 • edited Loading

mattdangerw commented Jan 13, 2025

gowthamkpr commented Jan 21, 2025 • edited Loading

gowthamkpr commented Jan 7, 2025 •

edited

Loading

gowthamkpr commented Jan 21, 2025 •

edited

Loading