A Multi-head Self-relation Network for Scene Text Recognition

被引:0
|
作者
Zhou, Junwei [1 ,2 ]
Gao, Hongchao [1 ]
Dai, Jiao [1 ]
Liu, Dongqin [1 ]
Han, Jizhong [1 ]
机构
[1] Chinese Acad Sci, Inst Informat Engn, Beijing, Peoples R China
[2] Univ Chinese Acad Sci, Beijing, Peoples R China
关键词
D O I
10.1109/ICPR48806.2021.9413339
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The text embedded in scene images can be seen everywhere in our lives. However, recognizing text from natural scene images is still a challenge because of its diverse shapes and distorted patterns. Recently, advanced recognition networks generally treat scene text recognition as a sequence prediction task. Although achieving excellent performance, these recognition networks consider the feature map cells as independent individuals and update cells state without utilizing the information of their related cells. And the local receptive field of traditional convolutional neural network (CNN) makes a single cell that cannot cover the whole text region in an image. Due to these issues, the existing recognition networks cannot extract the global context information in a visual scene. To deal with the above problems, we propose a Multi-head Self-relation Network(MSRN) for scene text recognition in this paper. The MSRN consists of several multihead self-relation layers, which are designed for extracting the global context information of a visual scene. Then the information of the related cells can be fused by multi-head self-relation layer. Furthermore, experiments over several public datasets demonstrate that our proposed recognition network achieves superior performance on several benchmark datasets including IC03, IC13, IC15, SVT-Perspective.
引用
收藏
页码:3969 / 3976
页数:8
相关论文
共 50 条
  • [31] FV-DMHN: Dual Multi-Head Network for Finger Vein Recognition
    An, Zhijin
    Ren, Xiaokui
    Tao, Zhiyong
    IEEE ACCESS, 2024, 12 : 76909 - 76918
  • [32] Multi-head attention graph convolutional network model: End-to-end entity and relation joint extraction based on multi-head attention graph convolutional network
    Tao, Zhihua
    Ouyang, Chunping
    Liu, Yongbin
    Chung, Tonglee
    Cao, Yixin
    CAAI TRANSACTIONS ON INTELLIGENCE TECHNOLOGY, 2023, 8 (02) : 468 - 477
  • [33] Improving CRNN with EfficientNet-like feature extractor and multi-head attention for text recognition
    Dinh Viet Sang
    Le Tran Bao Cuong
    SOICT 2019: PROCEEDINGS OF THE TENTH INTERNATIONAL SYMPOSIUM ON INFORMATION AND COMMUNICATION TECHNOLOGY, 2019, : 285 - 290
  • [34] Multi-label Text Classification Based on BiGRU and Multi-Head Self-Attention Mechanism
    Luo, Tongtong
    Shi, Nan
    Jin, Meilin
    Qin, Aolong
    Tang, Jiacheng
    Wang, Xihan
    Gao, Quanli
    Shao, Lianhe
    2024 3RD INTERNATIONAL CONFERENCE ON IMAGE PROCESSING AND MEDIA COMPUTING, ICIPMC 2024, 2024, : 204 - 210
  • [35] MHSAN: Multi-Head Self-Attention Network for Visual Semantic Embedding
    Park, Geondo
    Han, Chihye
    Kim, Daeshik
    Yoon, Wonjun
    2020 IEEE WINTER CONFERENCE ON APPLICATIONS OF COMPUTER VISION (WACV), 2020, : 1507 - 1515
  • [36] EEG-Based Emotion Recognition Using Convolutional Recurrent Neural Network with Multi-Head Self-Attention
    Hu, Zhangfang
    Chen, Libujie
    Luo, Yuan
    Zhou, Jingfan
    APPLIED SCIENCES-BASEL, 2022, 12 (21):
  • [37] Music Emotion Recognition Using Multi-head Self-attention-Based Models
    Xiao, Yao
    Ruan, Haoxin
    Zhao, Xujian
    Jin, Peiquan
    Cai, Xuebo
    ADVANCED INTELLIGENT COMPUTING TECHNOLOGY AND APPLICATIONS, ICIC 2023, PT IV, 2023, 14089 : 101 - 114
  • [38] MORAN: A Multi-Object Rectified Attention Network for scene text recognition
    Luo, Canjie
    Jin, Lianwen
    Sun, Zenghui
    PATTERN RECOGNITION, 2019, 90 : 109 - 118
  • [39] Temporal Residual Network Based Multi-Head Attention Model for Arabic Handwriting Recognition
    Zouari, Ramzi
    Othmen, Dalila
    Boubaker, Houcine
    Kherallah, Monji
    INTERNATIONAL ARAB JOURNAL OF INFORMATION TECHNOLOGY, 2023, 20 (3A) : 469 - 476
  • [40] Augmented multi-head classification network: MHATT
    Cayce, Garrett I.
    Depoian, Arthur C., II
    Bailey, Colleen P.
    Guturu, Parthasarathy
    SIGNAL PROCESSING, SENSOR/INFORMATION FUSION, AND TARGET RECOGNITION XXXII, 2023, 12547