Attention and Language Ensemble for Scene Text Recognition with Convolutional Sequence Modeling

被引:51
|
作者
Fang, Shancheng [1 ,2 ]
Xie, Hongtao [3 ]
Zha, Zheng-Jun [3 ]
Sun, Nannan [1 ,2 ]
Tan, Jianlong [1 ,2 ]
Zhang, Yongdong [3 ]
机构
[1] Chinese Acad Sci, Inst Informat Engn, Beijing, Peoples R China
[2] Univ Chinese Acad Sci, Sch Cyber Secur, Beijing, Peoples R China
[3] Univ Sci & Technol China, Sch Informat Sci & Technol, Hefei, Peoples R China
关键词
Text recognition; convolutional neural networks; multi-level supervised information; attention model;
D O I
10.1145/3240508.3240571
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
Recent dominant approaches for scene text recognition are mainly based on convolutional neural network (CNN) and recurrent neural network (RNN), where the CNN processes images and the RNN generates character sequences. Different from these methods, we propose an attention-based architecture(1) which is completely based on CNNs. The distinctive characteristics of our method include: (1) the method follows encoder-decoder architecture, in which the encoder is a two-dimensional residual CNN and the decoder is a deep one-dimensional CNN. (2) An attention module that captures visual cues, and a language module that models linguistic rules are designed equally in the decoder. Therefore the attention and language can be viewed as an ensemble to boost predictions jointly. (3) Instead of using a single loss from language aspect, multiple losses from attention and language are accumulated for training the networks in an end-to-end way. We conduct experiments on standard datasets for scene text recognition, including Street View Text, IIIT5K and ICDAR datasets. The experimental results show our CNN-based method has achieved state-of-the-art performance on several benchmark datasets, even without the use of RNN.
引用
收藏
页码:248 / 256
页数:9
相关论文
共 50 条
  • [1] Convolutional Attention Networks for Scene Text Recognition
    Xie, Hongtao
    Fang, Shancheng
    Zha, Zheng-Jun
    Yang, Yating
    Li, Yan
    Zhang, Yongdong
    [J]. ACM TRANSACTIONS ON MULTIMEDIA COMPUTING COMMUNICATIONS AND APPLICATIONS, 2019, 15 (01)
  • [2] Reading scene text with fully convolutional sequence modeling
    Gao, Yunze
    Chen, Yingying
    Wang, Jinqiao
    Tang, Ming
    Lu, Hanqing
    [J]. NEUROCOMPUTING, 2019, 339 : 161 - 170
  • [3] Visual and semantic ensemble for scene text recognition with gated dual mutual attention
    Zhiguang Liu
    Liangwei Wang
    Jian Qiao
    [J]. International Journal of Multimedia Information Retrieval, 2022, 11 : 669 - 680
  • [4] Visual and semantic ensemble for scene text recognition with gated dual mutual attention
    Liu, Zhiguang
    Wang, Liangwei
    Qiao, Jian
    [J]. INTERNATIONAL JOURNAL OF MULTIMEDIA INFORMATION RETRIEVAL, 2022, 11 (04) : 669 - 680
  • [5] An Attention-Based Convolutional Recurrent Neural Networks for Scene Text Recognition
    Alshawi, Adil Abdullah Abdulhussein
    Tanha, Jafar
    Balafar, Mohammad Ali
    [J]. IEEE ACCESS, 2024, 12 : 8123 - 8134
  • [6] FDTA: Fully Convolutional Scene Text Detection With Text Attention
    Cao, Yongcun
    Ma, Shuaisen
    Pan, Haichuan
    [J]. IEEE ACCESS, 2020, 8 : 155441 - 155449
  • [7] SCENE TEXT RECOGNITION WITH TEMPORAL CONVOLUTIONAL ENCODER
    Du, Xiangcheng
    Ma, Tianlong
    Zheng, Yingbin
    Ye, Hao
    Wu, Xingjiao
    He, Liang
    [J]. 2020 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2020, : 2383 - 2387
  • [8] An extended attention mechanism for scene text recognition
    Xiao, Zheng
    Nie, Zhenyu
    Song, Chao
    Chronopoulos, Anthony Theodore
    [J]. EXPERT SYSTEMS WITH APPLICATIONS, 2022, 203
  • [9] HIERARCHICAL REFINED ATTENTION FOR SCENE TEXT RECOGNITION
    Zhang, Min
    Ma, Meng
    Wang, Ping
    [J]. 2021 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP 2021), 2021, : 4175 - 4179
  • [10] Triggered Attention Model for Scene Text Recognition
    Zhang, Churong
    Ming, Yue
    [J]. ELEVENTH INTERNATIONAL CONFERENCE ON GRAPHICS AND IMAGE PROCESSING (ICGIP 2019), 2020, 11373