Skip to content

Latest commit

 

History

History
48 lines (36 loc) · 1.69 KB

README.md

File metadata and controls

48 lines (36 loc) · 1.69 KB

Generate Synthetic data with text over images with ground truth for text location and labels.


Generate data cmd

# before running the script, make sure to have input_data, checkout ./input_data
# also change the absolute path to output_imgs used inside wordart_gen/wordart_gen_func.py (line 27, 30) have to be ajusted (the docker client only support absolute path)
bash generate_data_cmd.sh

Description of directories:

  • ./fonts/ list of font to be used for rendering text on images
  • ./input_data/ scripts to get/create input data (raw-images & sample text) check ./input_data/README.md for more information
  • the data will get saved in ./output/, there are some samples output data
  • ./wordart_gen/ docker-client based script to generate Microsoft-wordart like text graphics being used inside main script generate_data.py

Things to consider while adding new fonts

  • Some fonts might only render part of text
  • Some fonts might not render at all and give blocks like representation
  • before adding new fonts make sure they can render the text

Input Output Ex

First Image Second Image

## ground truth annotations text file with bounding boxes x1, y1, w, h, label format
56,266,1172,68,なズ殿諏プをつプ楡む柿之vヒあ梅覇ゴ
1054,557,132,116,城
522,618,493,59,染う胡フ辺局ゼ森れ
193,376,1010,112,亜中政ねうテジさぎゲ
62,445,113,81,バ

Also support WordArt like text