Scene Text Recognition via Dual-path Network with Shape-driven Attention Alignment

被引:0
|
作者
Hu, Yijie [1 ]
Dong, Bin [2 ]
Huang, Kaizhu [3 ]
Ding, Lei [2 ]
Wang, Wei [1 ]
Huang, Xiaowei [4 ]
Wang, Qiu-Feng [1 ]
机构
[1] Xian Jiaotong Liverpool Univ, Sch Adv Technol, Renai Rd, Suzhou 215000, Jiangsu, Peoples R China
[2] Ricoh Software Res Ctr Beijing Co Ltd, Xizhimenwai St, Beijing 100080, Peoples R China
[3] Duke Kunshan Univ, Data Sci Res Ctr, Duke Ave, Kunshan 215316, Jiangsu, Peoples R China
[4] Univ Liverpool, Dept Comp Sci, Lime St, Liverpool L69 3BX, Merseyside, England
基金
英国工程与自然科学研究理事会; 中国国家自然科学基金;
关键词
OCR; scene text recognition; deformable attention; attention alignment; dual path network;
D O I
10.1145/3633517
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Scene text recognition (STR), one typical sequence-to-sequence problem, has drawn much attention recently in multimedia applications. To guarantee good performance, it is essential for STR to obtain aligned character-wise features from the whole-image feature maps. While most present works adopt fully data-driven attention-based alignment, such practice ignores specific character geometric information. In this article, built upon a group of learnable geometric points, we propose a novel shape-driven attention alignment method that is able to obtain character-wise features. Concretely, we first design a corner detector to generate a shape map to guide the attention alignments explicitly, where a series of points can be learned to represent character-wise features flexibly. We then propose a dual-path network with a mutual learning and cooperating strategy that successfully combines CNN with a ViT-based model, leading to further accuracy improvement. We conduct extensive experiments to evaluate the proposed method on various scene text benchmarks, including six popular regular and irregular datasets, two more challenging datasets (i.e., WordArt and OST), and three Chinese datasets. Experimental results indicate that our method can achieve superior performance with a comparable model size against many state-of-the-art models.
引用
收藏
页数:20
相关论文
共 50 条
  • [41] DPNET: DUAL-PATH NETWORK FOR EFFICIENT OBJECT DETECTION WITH LIGHTWEIGHT SELF-ATTENTION
    Shi, Huimin
    Zhou, Quan
    Ni, Yinghao
    Wu, Xiaofu
    Latecki, Longin Jan
    2022 IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING, ICIP, 2022, : 771 - 775
  • [42] Dual-Path Convolutional Neural Network Based on Band Interaction Block for Acoustic Scene Classification
    Jiang, Pengxu
    Yang, Yang
    Xie, Yue
    Zou, Cairong
    Wang, Qingyun
    IEICE TRANSACTIONS ON FUNDAMENTALS OF ELECTRONICS COMMUNICATIONS AND COMPUTER SCIENCES, 2024, E107A (07) : 1040 - 1044
  • [43] Classfication of Hyperspectral Image With Attention Mechanism-Based Dual-Path Convolutional Network
    Pu, Chunyu
    Huang, Hong
    Luo, Liuyang
    IEEE GEOSCIENCE AND REMOTE SENSING LETTERS, 2022, 19
  • [44] DPNet: Dual-Path Network for Real-Time Object Detection With Lightweight Attention
    Zhou, Quan
    Shi, Huimin
    Xiang, Weikang
    Kang, Bin
    Latecki, Longin Jan
    IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2025, 36 (03) : 4504 - 4518
  • [45] Attention Disturbance and Dual-Path Constraint Network for Occluded Person Re-identification
    Xia, Jiaer
    Tan, Lei
    Dai, Pingyang
    Zhao, Mingbo
    Wu, Yongjian
    Cao, Liujuan
    THIRTY-EIGHTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 38 NO 6, 2024, : 6198 - 6206
  • [46] Scene text recognition via dual character counting-aware visual and semantic modeling network
    Ke XIAO
    Anna ZHU
    Brian Kenji IWANA
    Cheng-Lin LIU
    ScienceChina(InformationSciences), 2024, 67 (03) : 313 - 314
  • [47] Scene text recognition via dual character counting-aware visual and semantic modeling network
    Xiao, Ke
    Zhu, Anna
    Iwana, Brian Kenji
    Liu, Cheng-Lin
    SCIENCE CHINA-INFORMATION SCIENCES, 2024, 67 (03)
  • [48] DP-DWA: DUAL-PATH DYNAMIC WEIGHT ATTENTION NETWORK WITH STREAMING DFSMN-SAN FOR AUTOMATIC SPEECH RECOGNITION
    Ma, Dongpeng
    Wang, Yiwen
    He, Liqiang
    Jin, Mingjie
    Su, Dan
    Yu, Dong
    2022 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2022, : 7692 - 7696
  • [49] SaHAN: Scale-aware hierarchical attention network for scene text recognition
    Zhang, Jiaxin
    Luo, Canjie
    Jin, Lianwen
    Wang, Tianwei
    Li, Ziyan
    Zhou, Weiying
    PATTERN RECOGNITION LETTERS, 2020, 136 : 205 - 211
  • [50] SLOAN: Scale-Adaptive Orientation Attention Network for Scene Text Recognition
    Dai, Pengwen
    Zhang, Hua
    Cao, Xiaochun
    IEEE TRANSACTIONS ON IMAGE PROCESSING, 2021, 30 : 1687 - 1701