Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Xerox 2.0 Bug in this script #2

Open
tobltobs opened this issue Aug 15, 2013 · 11 comments
Open

Xerox 2.0 Bug in this script #2

tobltobs opened this issue Aug 15, 2013 · 11 comments

Comments

@tobltobs
Copy link

The script makes some funny things with for e.g. an "8":

original

out

@jasonlfunk
Copy link
Owner

Yeah. I know the script doesn't work very well. I'm not actively supporting it.

palaziv pushed a commit to palaziv/ocr-text-extraction that referenced this issue Oct 15, 2014
The SIZE of the EB should be greater than 15 pixels but smaller than 1/5th of the image dimension to be considered for further processing.

This fix should also solve this issue: jasonlfunk#2
@RusEu
Copy link

RusEu commented Mar 1, 2016

same problem here :(

@RusEu
Copy link

RusEu commented Mar 1, 2016

Also with B letter ..

@RusEu
Copy link

RusEu commented Mar 1, 2016

On commenting this:
# if count_children(index, h_, contour) > 2:
# if DEBUG:
# print "\t skipping, is a container of letters"
# return True
I get the B and the 8 , but very dark and the OCR doesn't recognize it well. Any idea how to solve this?

@varunkumar2310
Copy link

varunkumar2310 commented Oct 31, 2016

max edge-boxes that can occur inside a text edge-box = 2
but each edge-box has 2 contours : outer contour + inner contour
take case of A : total contours = outer contour of A + inner contour of A + outer contour of delta + inner contour of delta = 4 contours
And an edge-box can had max of 2 edge-boxes completely enclosed in it.
(max enclosing edge-boxes for English language = 2)
Which means max contours that can be enclosed inside a text ( inside outer contour of text )
= inner contour of text + 2 * max edge-boxes = 1 + 2 * 2 = 5
change the max contours depending on language if necessary

Therefore in include_box:
modify the lines to :
if is_child(index, h_) and count_children(get_parent(index, h_), h_, contour) <= 5:
if count_children(index, h_, contour) > 5:

I have coded in OpenCV 3.1, C++ , and the following change works for me.
Correct me if my logic is wrong.

Images:
original
testing
processed_with_5
processed_with_5
rejected_with_5
rejected_with_5
processed_with_2
processed_with_2
rejected_with_2
rejected_with_2

@guddulrk
Copy link

guddulrk commented Apr 8, 2017

I am getting following error:

inf = file(scratch_text_name_root + '.txt')
NameError: name 'file' is not defined

Any help please??

@tsjason
Copy link

tsjason commented Apr 10, 2017

@guddulrk No idea. It looks like you might be missing some libraries or something. But I haven't used this in ages so I'm not sure.

@itcthienkhiem
Copy link

same error.

@itcthienkhiem
Copy link

capture
B still problem ?
How can fix it ?

@NightFury13
Copy link

max edge-boxes that can occur inside a text edge-box = 2
but each edge-box has 2 contours : outer contour + inner contour
take case of A : total contours = outer contour of A + inner contour of A + outer contour of delta + inner contour of delta = 4 contours
And an edge-box can had max of 2 edge-boxes completely enclosed in it.
(max enclosing edge-boxes for English language = 2)
Which means max contours that can be enclosed inside a text ( inside outer contour of text )
= inner contour of text + 2 * max edge-boxes = 1 + 2 * 2 = 5
change the max contours depending on language if necessary

Therefore in include_box:
modify the lines to :
if is_child(index, h_) and count_children(get_parent(index, h_), h_, contour) <= 5:
if count_children(index, h_, contour) > 5:

I have coded in OpenCV 3.1, C++ , and the following change works for me.
Correct me if my logic is wrong.

Images:
original
testing
processed_with_5
processed_with_5
rejected_with_5
rejected_with_5
processed_with_2
processed_with_2
rejected_with_2
rejected_with_2

Pappu! Thanks dude! :D @varunkumar2310

@sstefanov
Copy link

sstefanov commented Apr 27, 2020

Because of used Canny each contour is recognized twice.
This is the reason for not proper handling B and 8 chars. Children are 4, not 2.

To fix this just replace 2 with 4 in:
if is_child(index, h_) and count_children(get_parent(index, h_), h_, contour) <= 2:
must be
if is_child(index, h_) and count_children(get_parent(index, h_), h_, contour) <= 4:
and
if count_children(index, h_, contour) > 2:
must be
if count_children(index, h_, contour) > 4:

See my pull request.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

9 participants