-
Notifications
You must be signed in to change notification settings - Fork 1
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Merge pull request #2 from Bhashini-IITJ/east_script_changes
East script changes
- Loading branch information
Showing
17 changed files
with
116 additions
and
148 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,2 @@ | ||
__pycache__ | ||
results |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,24 @@ | ||
|
||
|
||
## Scene Text Detection - East | ||
To get started create a virtual env and install the PyTorch version > 2.4. | ||
### Installation | ||
```commandline | ||
conda create -n east_infer python=3.12 | ||
conda activate east_infer | ||
conda install pytorch==2.4.1 torchvision==0.19.1 torchaudio==2.4.1 pytorch-cuda=11.8 -c pytorch -c nvidia | ||
cd SceneTextDetection/East/ | ||
pip install -r requirements.txt | ||
``` | ||
|
||
### Inference | ||
|
||
The script ```infer.py``` shall be used for inference. Get more details about using CLI ```python infer.py -h```. | ||
|
||
Model checkpoints can also be accessed from github [assets](https://github.com/Bhashini-IITJ/SceneTextDetection/releases/tag/EAST). | ||
``` | ||
python infer.py --image_path ../demo_images/image_90.jpg --model_checkpoint tmp/epoch_990_checkpoint.pth.tar | ||
``` | ||
|
||
### Acknowledgement | ||
EAST re-implemenation [repository](https://github.com/foamliu/EAST). |
File renamed without changes.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,77 @@ | ||
import os | ||
import torch | ||
import cv2 | ||
import numpy as np | ||
import time | ||
import warnings | ||
import config as cfg | ||
from model import East | ||
import utils | ||
|
||
# Suppress warnings | ||
warnings.filterwarnings("ignore") | ||
|
||
def predict(image_path, model_checkpoint): | ||
# Load image | ||
# im = cv2.imread(image_path) | ||
im = cv2.imread(image_path)[:, :, ::-1] | ||
|
||
device = torch.device("cuda" if torch.cuda.is_available() else "cpu") | ||
|
||
# Initialize the EAST model and load checkpoint | ||
model = East(device) | ||
model = torch.nn.DataParallel(model, device_ids=cfg.gpu_ids) | ||
|
||
# Load the model checkpoint with weights_only=True | ||
checkpoint = torch.load(model_checkpoint, map_location=torch.device(device), weights_only=True) | ||
model.load_state_dict(checkpoint['state_dict']) | ||
model.eval() | ||
|
||
# Resize image and convert to tensor format | ||
im_resized, (ratio_h, ratio_w) = utils.resize_image(im) | ||
im_resized = im_resized.astype(np.float32).transpose(2, 0, 1) | ||
im_tensor = torch.from_numpy(im_resized).unsqueeze(0).cpu() | ||
|
||
# Inference | ||
timer = {'net': 0, 'restore': 0, 'nms': 0} | ||
start = time.time() | ||
score, geometry = model(im_tensor) | ||
timer['net'] = time.time() - start | ||
|
||
# Process output | ||
score = score.permute(0, 2, 3, 1).data.cpu().numpy() | ||
geometry = geometry.permute(0, 2, 3, 1).data.cpu().numpy() | ||
|
||
# Detect boxes | ||
boxes, timer = utils.detect( | ||
score_map=score, geo_map=geometry, timer=timer, | ||
score_map_thresh=cfg.score_map_thresh, box_thresh=cfg.box_thresh, | ||
nms_thres=cfg.box_thresh | ||
) | ||
bbox_result_dict = {'detections': []} | ||
|
||
# Parse detected boxes and adjust coordinates | ||
if boxes is not None: | ||
boxes = boxes[:, :8].reshape((-1, 4, 2)) | ||
boxes[:, :, 0] /= ratio_w | ||
boxes[:, :, 1] /= ratio_h | ||
for box in boxes: | ||
box = utils.sort_poly(box.astype(np.int32)) | ||
if np.linalg.norm(box[0] - box[1]) < 5 or np.linalg.norm(box[3] - box[0]) < 5: | ||
continue | ||
bbox_result_dict['detections'].append([ | ||
[int(coord[0]), int(coord[1])] for coord in box | ||
]) | ||
|
||
return bbox_result_dict | ||
|
||
if __name__ == "__main__": | ||
import argparse | ||
parser = argparse.ArgumentParser(description='Text detection using EAST model') | ||
parser.add_argument('--image_path', type=str, required=True, help='Path to the input image') | ||
parser.add_argument('--model_checkpoint', type=str, required=True, help='Path to the model checkpoint file') | ||
args = parser.parse_args() | ||
|
||
# Run prediction and get results as dictionary | ||
detection_result = predict(args.image_path, args.model_checkpoint) | ||
print(detection_result) |
File renamed without changes.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
File renamed without changes.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,4 +1,2 @@ | ||
torch==2.4.1 | ||
torchvision==0.19.1 | ||
shapely==2.0.6 | ||
opencv-python==4.10.0.84 |
File renamed without changes.
File renamed without changes.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file was deleted.
Oops, something went wrong.
This file was deleted.
Oops, something went wrong.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1 +1,6 @@ | ||
# SceneTextDetection | ||
# SceneTextDetection | ||
This repository provides implementations of various scene text detection models, focusing on the detection of text in images. | ||
1. **EAST**: An Efficient and Accurate Scene Text Detector ([paper](https://arxiv.org/abs/1704.03155)) | ||
|
||
# Fine-tune schema | ||
The models in this repository have been fine-tuned on the [Bharat Scene Text Dataset (BSTD)](https://github.com/Bhashini-IITJ/BharatSceneTextDataset), a large-scale dataset designed specifically for scene text detection tasks. The demo images used in this repository are also sourced from this dataset. |
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.