A Multi-head Self-relation Network for Scene Text Recognition

被引：0

作者：

Zhou, Junwei ^{[1
,2
]}

Gao, Hongchao ^{[1
]}

Dai, Jiao ^{[1
]}

Liu, Dongqin ^{[1
]}

Han, Jizhong ^{[1
]}

机构：

[1] Chinese Acad Sci, Inst Informat Engn, Beijing, Peoples R China

[2] Univ Chinese Acad Sci, Beijing, Peoples R China

来源：

2020 25TH INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION (ICPR) | 2021年

关键词：

D O I：

10.1109/ICPR48806.2021.9413339

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

The text embedded in scene images can be seen everywhere in our lives. However, recognizing text from natural scene images is still a challenge because of its diverse shapes and distorted patterns. Recently, advanced recognition networks generally treat scene text recognition as a sequence prediction task. Although achieving excellent performance, these recognition networks consider the feature map cells as independent individuals and update cells state without utilizing the information of their related cells. And the local receptive field of traditional convolutional neural network (CNN) makes a single cell that cannot cover the whole text region in an image. Due to these issues, the existing recognition networks cannot extract the global context information in a visual scene. To deal with the above problems, we propose a Multi-head Self-relation Network(MSRN) for scene text recognition in this paper. The MSRN consists of several multihead self-relation layers, which are designed for extracting the global context information of a visual scene. Then the information of the related cells can be fused by multi-head self-relation layer. Furthermore, experiments over several public datasets demonstrate that our proposed recognition network achieves superior performance on several benchmark datasets including IC03, IC13, IC15, SVT-Perspective.

引用

页码：3969 / 3976

页数：8

共 50 条

[41] Hybrid neural network model based on multi-head attention for English text emotion analysis
Li, Ping
EAI ENDORSED TRANSACTIONS ON SCALABLE INFORMATION SYSTEMS, 2022, 9 (35):
[42] Dilated causal convolution with multi-head self attention for sensor human activity recognition
Rebeen Ali Hamad
Masashi Kimura
Longzhi Yang
Wai Lok Woo
Bo Wei
Neural Computing and Applications, 2021, 33 : 13705 - 13722
[43] A facial depression recognition method based on hybrid multi-head cross attention network
Li, Yutong
Liu, Zhenyu
Zhou, Li
Yuan, Xiaoyan
Shangguan, Zixuan
Hu, Xiping
Hu, Bin
FRONTIERS IN NEUROSCIENCE, 2023, 17
[44] Distract Your Attention: Multi-Head Cross Attention Network for Facial Expression Recognition
Wen, Zhengyao
Lin, Wenzhong
Wang, Tao
Xu, Ge
BIOMIMETICS, 2023, 8 (02)
[45] Lip Recognition Based on Bi-GRU with Multi-Head Self-Attention
Ni, Ran
Jiang, Haiyang
Zhou, Lu
Lu, Yuanyao
ARTIFICIAL INTELLIGENCE APPLICATIONS AND INNOVATIONS, PT III, AIAI 2024, 2024, 713 : 99 - 110
[46] Fast Neural Chinese Named Entity Recognition with Multi-head Self-attention
Qi, Tao
Wu, Chuhan
Wu, Fangzhao
Ge, Suyu
Liu, Junxin
Huang, Yongfeng
Xie, Xing
KNOWLEDGE GRAPH AND SEMANTIC COMPUTING: KNOWLEDGE COMPUTING AND LANGUAGE UNDERSTANDING, 2019, 1134 : 98 - 110
[47] Research on Transportation Mode Recognition Based on Multi-Head Attention Temporal Convolutional Network
Cheng, Shuyu
Liu, Yingan
SENSORS, 2023, 23 (07)
[48] Dilated causal convolution with multi-head self attention for sensor human activity recognition
Hamad, Rebeen Ali
Kimura, Masashi
Yang, Longzhi
Woo, Wai Lok
Wei, Bo
NEURAL COMPUTING & APPLICATIONS, 2021, 33 (20): : 13705 - 13722
[49] Scene Text Recognition with Cascade Attention Network
Zhang, Min
Ma, Meng
Wang, Ping
PROCEEDINGS OF THE 2021 INTERNATIONAL CONFERENCE ON MULTIMEDIA RETRIEVAL (ICMR '21), 2021, : 385 - 393
[50] Multi-Head State Space Model for Speech Recognition
Fathullah, Yassir
Wu, Chunyang
Shangguan, Yuan
Jia, Junteng
Xiong, Wenhan
Mahadeokar, Jay
Liu, Chunxi
Shi, Yangyang
Kalinli, Ozlem
Seltzer, Mike
Gales, Mark J. F.
INTERSPEECH 2023, 2023, : 241 - 245

← 1 2 3 4 5 →