Replies: 3 comments
-
没问题了,推理时指定字典就行, python tools/infer/predict_rec.py --image_dir='E:/BaiduNetdiskDownload/DataSet/Chinese_dataset/images/img_0000001.jpg' --rec_model_dir='./inference/rec_svtrv2_ch' --rec_char_dict_path='E:/BaiduNetdiskDownload/DataSet/Chinese_dataset/labels_.txt' |
Beta Was this translation helpful? Give feedback.
0 replies
-
这个要paddleocr哪个版本才能训练? |
Beta Was this translation helpful? Give feedback.
0 replies
-
这个模型要什么环境财经运行?各种报错,运行不起来,有没有教程? |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
使用的svtr2识别模型架构,使用了svtr2的预训练模型,
训练数据是合成的数据,有100万张
。lable_.txt没有用paddle ocr自带的文字,而是使用训练文本中的文字重新生成的,训练了五轮,第三轮最佳,精度有98%,
配置文件如下:Global:
debug: false
use_gpu: true
epoch_num: 20 # 200
log_smooth_window: 20
print_batch_step: 10
save_model_dir: ./output/rec_svtrv2_ch
save_epoch_step: 1
eval_batch_step: [0, 500]
cal_metric_during_train: False
pretrained_model: ./pretrain_models/openatom_rec_svtrv2_ch_train/best_accuracy
checkpoints: ./output/rec_svtrv2_ch/best_accuracy
save_inference_dir:
use_visualdl: false
infer_img: doc/imgs_words/ch/
character_dict_path: E:/BaiduNetdiskDownload/DataSet/Chinese_dataset/labels_.txt #ppocr/utils/ppocr_keys_v1.txt
max_text_length: &max_text_length 25
infer_mode: false
use_space_char: true
distributed: true
save_res_path: ./output/rec/predicts_svrtv2.txt
Optimizer:
name: AdamW
beta1: 0.9
beta2: 0.999
epsilon: 1.e-8
weight_decay: 0.05
no_weight_decay_name: norm
one_dim_param_no_weight_decay: True
lr:
name: Cosine
learning_rate: 0.001 # 8gpus 192bs
warmup_epoch: 5
Architecture:
model_type: rec
algorithm: SVTR_HGNet
Transform:
Backbone:
name: SVTRv2
use_pos_embed: False
dims: [128, 256, 384]
depths: [6, 6, 6]
num_heads: [4, 8, 12]
mixer: [['Conv','Conv','Conv','Conv','Conv','Conv'],['Conv','Conv','Global','Global','Global','Global'],['Global','Global','Global','Global','Global','Global']]
local_k: [[5, 5], [5, 5], [-1, -1]]
sub_k: [[2, 1], [2, 1], [-1, -1]]
last_stage: False
use_pool: True
Head:
name: MultiHead
head_list:
- CTCHead:
Neck:
name: svtr
dims: 256
depth: 2
hidden_dims: 256
kernel_size: [1, 3]
use_guide: True
Head:
fc_decay: 0.00001
- NRTRHead:
nrtr_dim: 384
max_text_length: *max_text_length
num_decoder_layers: 2
Loss:
name: MultiLoss
loss_config_list:
- CTCLoss:
- NRTRLoss:
PostProcess:
name: CTCLabelDecode
Metric:
name: RecMetric
main_indicator: acc
Train:
![image](https://private-user-images.githubusercontent.com/30116810/352461127-27292f72-79cb-4d1e-8134-3dc814ebd7ee.png?jwt=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJnaXRodWIuY29tIiwiYXVkIjoicmF3LmdpdGh1YnVzZXJjb250ZW50LmNvbSIsImtleSI6ImtleTUiLCJleHAiOjE3MzkxMTU4MDgsIm5iZiI6MTczOTExNTUwOCwicGF0aCI6Ii8zMDExNjgxMC8zNTI0NjExMjctMjcyOTJmNzItNzljYi00ZDFlLTgxMzQtM2RjODE0ZWJkN2VlLnBuZz9YLUFtei1BbGdvcml0aG09QVdTNC1ITUFDLVNIQTI1NiZYLUFtei1DcmVkZW50aWFsPUFLSUFWQ09EWUxTQTUzUFFLNFpBJTJGMjAyNTAyMDklMkZ1cy1lYXN0LTElMkZzMyUyRmF3czRfcmVxdWVzdCZYLUFtei1EYXRlPTIwMjUwMjA5VDE1MzgyOFomWC1BbXotRXhwaXJlcz0zMDAmWC1BbXotU2lnbmF0dXJlPWI5MTk4YWFkYTY4ZjQwZTQ3OGZiZmVhNDBmZDVmMTQ0NThiOTI4ZGFkOGE1MzMzMjljZDUxMWMxY2RlMTQ2MWEmWC1BbXotU2lnbmVkSGVhZGVycz1ob3N0In0.Fqm-csj4G-qaPcCklHK2UExtUqFe_D3-UypUMHY0pW0)
dataset:
name: MultiScaleDataSet
ds_width: false
data_dir: E:/BaiduNetdiskDownload/DataSet/Chinese_dataset/images
ext_op_transform_idx: 1
label_file_list:
- E:/BaiduNetdiskDownload/DataSet/Chinese_dataset/train_list.txt
transforms:
- DecodeImage:
img_mode: BGR
channel_first: false
- RecAug:
- MultiLabelEncode:
gtc_encode: NRTRLabelEncode
- KeepKeys:
keep_keys:
- image
- label_ctc
- label_gtc
- length
- valid_ratio
sampler:
name: MultiScaleSampler
scales: [[320, 32], [320, 48], [320, 64]]
first_bs: &bs 192
fix_bs: false
divided_factor: [8, 16] # w, h
is_training: True
loader:
shuffle: true
batch_size_per_card: *bs
drop_last: true
num_workers: 8
Eval:
dataset:
name: SimpleDataSet
data_dir: E:/BaiduNetdiskDownload/DataSet/Chinese_dataset/images
label_file_list:
- E:/BaiduNetdiskDownload/DataSet/Chinese_dataset/val_list.txt
transforms:
- DecodeImage:
img_mode: BGR
channel_first: false
- MultiLabelEncode:
gtc_encode: NRTRLabelEncode
- RecResizeImg:
image_shape: [3, 48, 320]
- KeepKeys:
keep_keys:
- image
- label_ctc
- label_gtc
- length
- valid_ratio
loader:
shuffle: false
drop_last: false
batch_size_per_card: 128
num_workers: 4
使用命令将训练模型转换为推理模型,python tools/export_model.py -c configs/rec/SVTRv2/rec_svtrv2_ch.yml
最后识别结果是乱码
Beta Was this translation helpful? Give feedback.
All reactions