Reading the characters in the image file with my own handwriting #18

ozlem-atiz · 2022-01-17T12:33:04Z

Hi, I'm using pytesseract to recognize the characters in the jpeg file for a scanned file. But for manually entered text, pytesseract is very bad. How can I use your code for my image file? @samkit-jain

samkit-jain · 2022-01-17T18:38:07Z

Hi @ozlem-atiz This repo is not designed for recognising continuous handwritten text (feel free to raise a PR that adds support for it). If your text is not continuous or if you are able to crop each character individually from the image as at https://github.com/samkit-jain/Handwriting-Recognition/blob/master/Screenshots/label_5.png, then you can run the model on those. The method

Handwriting-Recognition/Python/drawer.py

Lines 36 to 62 in 785cf3e

    
               def get_contours(self): 
        
                   """ 
        
                   Method to find contours in an image and crop them and return a list with cropped contours 
        
                   """ 
        
                   images = [] 
        
                   main_image = self.img 
        
                   orig_image = main_image.copy() 
        
                   # convert to greyscale and apply Gaussian filtering 
        
                   main_image = cv2.cvtColor(src=main_image, code=cv2.COLOR_BGR2GRAY) 
        
                   main_image = cv2.GaussianBlur(src=main_image, ksize=(5, 5), sigmaX=0) 
        
                   # threshold the image 
        
                   _, main_image = cv2.threshold(src=main_image, thresh=127, maxval=255, type=cv2.THRESH_BINARY) 
        
                   # find contours in the image 
        
                   contours, _ = cv2.findContours(image=main_image.copy(), mode=cv2.RETR_EXTERNAL, method=cv2.CHAIN_APPROX_SIMPLE) 
        
                   # get rectangles containing each contour 
        
                   bboxes = [cv2.boundingRect(array=contour) for contour in contours] 
        
                   for bbox in bboxes: 
        
                       x, y, width, height = bbox[:4] 
        
                       images.append(orig_image[y:y + height, x:x + width]) 
        
                   return images

is responsible for cropping the character bounding boxes from an image.

Handwriting-Recognition/Python/drawer.py

Lines 99 to 125 in 785cf3e

    
               @staticmethod 
        
               def convert_to_emnist(img): 
        
                   """ 
        
                   Method to make an image EMNIST format compatible. img is a cropped version of the character image. 
        
                   Conversion process available in section II-A of the EMNIST paper available at https://arxiv.org/abs/1702.05373v1 
        
                   """ 
        
                   height, width = img.shape[:2] 
        
                   # create a square frame with lengths equal to the largest dimension 
        
                   emnist_image = np.zeros(shape=(max(height, width), max(height, width), 3), dtype=np.uint8) 
        
                   # center the cropped image in it 
        
                   offset_height = int(float(emnist_image.shape[0] / 2.0) - float(height / 2.0)) 
        
                   offset_width = int(float(emnist_image.shape[1] / 2.0) - float(width / 2.0)) 
        
                   emnist_image[offset_height:offset_height + height, offset_width:offset_width + width] = img 
        
                   # resize to 26x26 using bi-cubic interpolation 
        
                   emnist_image = cv2.resize(src=emnist_image, dsize=(26, 26), interpolation=cv2.INTER_CUBIC) 
        
                   # refit the 26x26 to 28x28 so that characters don't touch the boundaries 
        
                   fin_image = np.zeros(shape=(28, 28, 3), dtype=np.uint8) 
        
                   fin_image[1:27, 1:27] = emnist_image 
        
                   return fin_image

then converts them to the required EMNIST format and then you can pass in that to the model and get the prediction. Of course, you would have to make some changes and adjust or use the code accordingly.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Reading the characters in the image file with my own handwriting #18

Reading the characters in the image file with my own handwriting #18

ozlem-atiz commented Jan 17, 2022 •

edited

Loading

samkit-jain commented Jan 17, 2022

Reading the characters in the image file with my own handwriting #18

Reading the characters in the image file with my own handwriting #18

Comments

ozlem-atiz commented Jan 17, 2022 • edited Loading

samkit-jain commented Jan 17, 2022

ozlem-atiz commented Jan 17, 2022 •

edited

Loading