Scene Text Recognition via Dual-path Network with Shape-driven Attention Alignment

被引:0
|
作者
Hu, Yijie [1 ]
Dong, Bin [2 ]
Huang, Kaizhu [3 ]
Ding, Lei [2 ]
Wang, Wei [1 ]
Huang, Xiaowei [4 ]
Wang, Qiu-Feng [1 ]
机构
[1] Xian Jiaotong Liverpool Univ, Sch Adv Technol, Renai Rd, Suzhou 215000, Jiangsu, Peoples R China
[2] Ricoh Software Res Ctr Beijing Co Ltd, Xizhimenwai St, Beijing 100080, Peoples R China
[3] Duke Kunshan Univ, Data Sci Res Ctr, Duke Ave, Kunshan 215316, Jiangsu, Peoples R China
[4] Univ Liverpool, Dept Comp Sci, Lime St, Liverpool L69 3BX, Merseyside, England
基金
英国工程与自然科学研究理事会; 中国国家自然科学基金;
关键词
OCR; scene text recognition; deformable attention; attention alignment; dual path network;
D O I
10.1145/3633517
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Scene text recognition (STR), one typical sequence-to-sequence problem, has drawn much attention recently in multimedia applications. To guarantee good performance, it is essential for STR to obtain aligned character-wise features from the whole-image feature maps. While most present works adopt fully data-driven attention-based alignment, such practice ignores specific character geometric information. In this article, built upon a group of learnable geometric points, we propose a novel shape-driven attention alignment method that is able to obtain character-wise features. Concretely, we first design a corner detector to generate a shape map to guide the attention alignments explicitly, where a series of points can be learned to represent character-wise features flexibly. We then propose a dual-path network with a mutual learning and cooperating strategy that successfully combines CNN with a ViT-based model, leading to further accuracy improvement. We conduct extensive experiments to evaluate the proposed method on various scene text benchmarks, including six popular regular and irregular datasets, two more challenging datasets (i.e., WordArt and OST), and three Chinese datasets. Experimental results indicate that our method can achieve superior performance with a comparable model size against many state-of-the-art models.
引用
收藏
页数:20
相关论文
共 50 条
  • [1] A scene text detection based on dual-path feature fusion
    Zhao P.
    Xu B.-P.
    Yan S.
    Liu Z.-Y.
    Kongzhi yu Juece/Control and Decision, 2021, 36 (09): : 2179 - 2186
  • [2] Sequential alignment attention model for scene text recognition
    Wu, Yan
    Fan, Jiaxin
    Tao, Renshuai
    Wang, Jiakai
    Qin, Haotong
    Liu, Aishan
    Liu, Xianglong
    Tao, Renshuai (rstao@buaa.edu.cn), 1600, Academic Press Inc. (80):
  • [3] Sequential alignment attention model for scene text recognition
    Wu, Yan
    Fan, Jiaxin
    Tao, Renshuai
    Wang, Jiakai
    Qin, Haotong
    Liu, Aishan
    Liu, Xianglong
    JOURNAL OF VISUAL COMMUNICATION AND IMAGE REPRESENTATION, 2021, 80
  • [4] DADAN: dual-path attention with distribution analysis network for text-image matching
    Li, Wenhao
    Zhu, Hongqing
    Yang, Suyi
    Zhang, Han
    SIGNAL IMAGE AND VIDEO PROCESSING, 2022, 16 (03) : 797 - 805
  • [5] DADAN: dual-path attention with distribution analysis network for text-image matching
    Wenhao Li
    Hongqing Zhu
    Suyi Yang
    Han Zhang
    Signal, Image and Video Processing, 2022, 16 : 797 - 805
  • [6] Scene Text Recognition with Cascade Attention Network
    Zhang, Min
    Ma, Meng
    Wang, Ping
    PROCEEDINGS OF THE 2021 INTERNATIONAL CONFERENCE ON MULTIMEDIA RETRIEVAL (ICMR '21), 2021, : 385 - 393
  • [7] Randomized attention and dual-path system for electrocardiogram identity recognition
    Sun, Le
    Li, Huiyun
    Muhammad, Ghulam
    ENGINEERING APPLICATIONS OF ARTIFICIAL INTELLIGENCE, 2024, 132
  • [8] Dual Relation Network for Scene Text Recognition
    Li, Ming
    Fu, Bin
    Chen, Han
    He, Junjun
    Qiao, Yu
    IEEE TRANSACTIONS ON MULTIMEDIA, 2023, 25 : 4094 - 4107
  • [9] Dual-Path Hybrid Attention Network for Monaural Speech Separation
    Qiu, Wenbo
    Hu, Ying
    IEEE ACCESS, 2022, 10 : 78754 - 78763
  • [10] Dual-Path Attention Network for Compressed Sensing Image Reconstruction
    Sun, Yubao
    Chen, Jiwei
    Liu, Qingshan
    Liu, Bo
    Guo, Guodong
    IEEE TRANSACTIONS ON IMAGE PROCESSING, 2020, 29 : 9482 - 9495