A Multi-scale Deformable Convolution Network Model for Text Recognition

被引:0
|
作者
Cheng, Lang [1 ]
Yan, Junhong [1 ]
Chen, Minghui [1 ]
Lu, Yuanwen [1 ]
Li, Yunhong [1 ]
Hu, Lei [1 ]
机构
[1] Jiangxi Normal Univ, Sch Comp & Informat Engn, Nanchang 330022, Jiangxi, Peoples R China
关键词
Text recognition; Multi-scale feature extraction; Deformable convolution; Receptive field;
D O I
10.1117/12.2623370
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
Natural scene text recognition is one of the most challenging tasks in recent years. Compared with traditional document text, natural scene text has the characteristics of various shapes and different directions, so the accuracy of scene text recognition still needs to be improved. In order to locate the text region better and identify the text content more accurate, we present a multi-scale deformable convolution network model for text recognition. The initial image is irregularly corrected through the rectified network, and the ResNet with FPN structure is used as the backbone network to achieve multi-scale feature extraction. In addition, the feature fusion method of Add is adopted to reduce feature information losing and increase the strength of feature extraction in the text area. The deformable convolution block is introduced in the deep convolution to improve the deformation modeling ability of convolution and expand the receptive field. The prediction module adopts the Transformer and abandons the inherent pre and post attributes of RNN to realize parallel operation and solve the problem of path length between remote dependencies. In order to evaluate the effectiveness of the proposed method, we trained our model on two mixed data sets, MJSynth and SynthText, and tested it on some regular and irregular data sets. The experiment results demonstrate that this method performs well in irregular scene text recognition, especially in CUTE80.
引用
收藏
页数:9
相关论文
共 50 条
  • [41] Multi-scale dilated convolution of convolutional neural network for crowd counting
    Wang, Yanjie
    Hu, Shiyu
    Wang, Guodong
    Chen, Chenglizhao
    Pan, Zhenkuan
    [J]. MULTIMEDIA TOOLS AND APPLICATIONS, 2020, 79 (1-2) : 1057 - 1073
  • [42] A Multi-Scale Video Longformer Network for Action Recognition
    Chen, Congping
    Zhang, Chunsheng
    Dong, Xin
    [J]. APPLIED SCIENCES-BASEL, 2024, 14 (03):
  • [43] Multi-scale dilated convolution of convolutional neural network for crowd counting
    Yanjie Wang
    Shiyu Hu
    Guodong Wang
    Chenglizhao Chen
    Zhenkuan Pan
    [J]. Multimedia Tools and Applications, 2020, 79 : 1057 - 1073
  • [44] Multi-scale convolutional neural network for texture recognition
    Wei, Xile
    Hu, Benyong
    Gao, Tianshi
    Wang, Jiang
    Deng, Bin
    [J]. DISPLAYS, 2022, 75
  • [45] Multi-Scale Weight Sharing Network for Image Recognition
    Aich, Shubhra
    Yamazaki, Masaki
    Taniguchi, Yasuhiro
    Stavness, Ian
    [J]. PATTERN RECOGNITION LETTERS, 2020, 131 : 348 - 354
  • [46] Multi-scale Downscaling with Bayesian Convolution Network for ENSO SST Pattern
    Mu, Bin
    Qin, Bo
    Yuan, Shijin
    [J]. 2020 5TH INTERNATIONAL CONFERENCE ON ELECTROMECHANICAL CONTROL TECHNOLOGY AND TRANSPORTATION (ICECTT 2020), 2020, : 359 - 362
  • [47] Steganalysis for HEVC video based on multi-scale residual convolution network
    Zhang, Min
    Li, Zhaohong
    Liu, Jindou
    Zhang, Zhenzhen
    [J]. Beijing Hangkong Hangtian Daxue Xuebao/Journal of Beijing University of Aeronautics and Astronautics, 2021, 47 (11): : 2226 - 2233
  • [48] Multi-scale dilated convolution of feature Fusion Network for Crowd counting
    Donghua Liu
    Guodong Wang
    Guangtao Zhai
    [J]. Multimedia Tools and Applications, 2022, 81 : 37939 - 37952
  • [49] Deep multi-scale dilated convolution network for coronary artery segmentation
    Qiu, Yue
    Chai, Senchun
    Zhu, Enjun
    Zhang, Nan
    Zhang, Gaochang
    Zhao, Xin
    Cui, Lingguo
    Farhan, Ishrak Md
    [J]. BIOMEDICAL SIGNAL PROCESSING AND CONTROL, 2024, 92
  • [50] Multi-scale dilated convolution of convolutional neural network for image denoising
    Yanjie Wang
    Guodong Wang
    Chenglizhao Chen
    Zhenkuan Pan
    [J]. Multimedia Tools and Applications, 2019, 78 : 19945 - 19960