A Multi-head Self-relation Network for Scene Text Recognition

被引:0
|
作者
Zhou, Junwei [1 ,2 ]
Gao, Hongchao [1 ]
Dai, Jiao [1 ]
Liu, Dongqin [1 ]
Han, Jizhong [1 ]
机构
[1] Chinese Acad Sci, Inst Informat Engn, Beijing, Peoples R China
[2] Univ Chinese Acad Sci, Beijing, Peoples R China
关键词
D O I
10.1109/ICPR48806.2021.9413339
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The text embedded in scene images can be seen everywhere in our lives. However, recognizing text from natural scene images is still a challenge because of its diverse shapes and distorted patterns. Recently, advanced recognition networks generally treat scene text recognition as a sequence prediction task. Although achieving excellent performance, these recognition networks consider the feature map cells as independent individuals and update cells state without utilizing the information of their related cells. And the local receptive field of traditional convolutional neural network (CNN) makes a single cell that cannot cover the whole text region in an image. Due to these issues, the existing recognition networks cannot extract the global context information in a visual scene. To deal with the above problems, we propose a Multi-head Self-relation Network(MSRN) for scene text recognition in this paper. The MSRN consists of several multihead self-relation layers, which are designed for extracting the global context information of a visual scene. Then the information of the related cells can be fused by multi-head self-relation layer. Furthermore, experiments over several public datasets demonstrate that our proposed recognition network achieves superior performance on several benchmark datasets including IC03, IC13, IC15, SVT-Perspective.
引用
收藏
页码:3969 / 3976
页数:8
相关论文
共 50 条
  • [41] Hybrid neural network model based on multi-head attention for English text emotion analysis
    Li, Ping
    EAI ENDORSED TRANSACTIONS ON SCALABLE INFORMATION SYSTEMS, 2022, 9 (35):
  • [42] Dilated causal convolution with multi-head self attention for sensor human activity recognition
    Rebeen Ali Hamad
    Masashi Kimura
    Longzhi Yang
    Wai Lok Woo
    Bo Wei
    Neural Computing and Applications, 2021, 33 : 13705 - 13722
  • [43] A facial depression recognition method based on hybrid multi-head cross attention network
    Li, Yutong
    Liu, Zhenyu
    Zhou, Li
    Yuan, Xiaoyan
    Shangguan, Zixuan
    Hu, Xiping
    Hu, Bin
    FRONTIERS IN NEUROSCIENCE, 2023, 17
  • [44] Distract Your Attention: Multi-Head Cross Attention Network for Facial Expression Recognition
    Wen, Zhengyao
    Lin, Wenzhong
    Wang, Tao
    Xu, Ge
    BIOMIMETICS, 2023, 8 (02)
  • [45] Lip Recognition Based on Bi-GRU with Multi-Head Self-Attention
    Ni, Ran
    Jiang, Haiyang
    Zhou, Lu
    Lu, Yuanyao
    ARTIFICIAL INTELLIGENCE APPLICATIONS AND INNOVATIONS, PT III, AIAI 2024, 2024, 713 : 99 - 110
  • [46] Fast Neural Chinese Named Entity Recognition with Multi-head Self-attention
    Qi, Tao
    Wu, Chuhan
    Wu, Fangzhao
    Ge, Suyu
    Liu, Junxin
    Huang, Yongfeng
    Xie, Xing
    KNOWLEDGE GRAPH AND SEMANTIC COMPUTING: KNOWLEDGE COMPUTING AND LANGUAGE UNDERSTANDING, 2019, 1134 : 98 - 110
  • [47] Research on Transportation Mode Recognition Based on Multi-Head Attention Temporal Convolutional Network
    Cheng, Shuyu
    Liu, Yingan
    SENSORS, 2023, 23 (07)
  • [48] Dilated causal convolution with multi-head self attention for sensor human activity recognition
    Hamad, Rebeen Ali
    Kimura, Masashi
    Yang, Longzhi
    Woo, Wai Lok
    Wei, Bo
    NEURAL COMPUTING & APPLICATIONS, 2021, 33 (20): : 13705 - 13722
  • [49] Scene Text Recognition with Cascade Attention Network
    Zhang, Min
    Ma, Meng
    Wang, Ping
    PROCEEDINGS OF THE 2021 INTERNATIONAL CONFERENCE ON MULTIMEDIA RETRIEVAL (ICMR '21), 2021, : 385 - 393
  • [50] Multi-Head State Space Model for Speech Recognition
    Fathullah, Yassir
    Wu, Chunyang
    Shangguan, Yuan
    Jia, Junteng
    Xiong, Wenhan
    Mahadeokar, Jay
    Liu, Chunxi
    Shi, Yangyang
    Kalinli, Ozlem
    Seltzer, Mike
    Gales, Mark J. F.
    INTERSPEECH 2023, 2023, : 241 - 245