Scene Text Recognition via Dual-path Network with Shape-driven Attention Alignment

被引:0
|
作者
Hu, Yijie [1 ]
Dong, Bin [2 ]
Huang, Kaizhu [3 ]
Ding, Lei [2 ]
Wang, Wei [1 ]
Huang, Xiaowei [4 ]
Wang, Qiu-Feng [1 ]
机构
[1] Xian Jiaotong Liverpool Univ, Sch Adv Technol, Renai Rd, Suzhou 215000, Jiangsu, Peoples R China
[2] Ricoh Software Res Ctr Beijing Co Ltd, Xizhimenwai St, Beijing 100080, Peoples R China
[3] Duke Kunshan Univ, Data Sci Res Ctr, Duke Ave, Kunshan 215316, Jiangsu, Peoples R China
[4] Univ Liverpool, Dept Comp Sci, Lime St, Liverpool L69 3BX, Merseyside, England
基金
英国工程与自然科学研究理事会; 中国国家自然科学基金;
关键词
OCR; scene text recognition; deformable attention; attention alignment; dual path network;
D O I
10.1145/3633517
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Scene text recognition (STR), one typical sequence-to-sequence problem, has drawn much attention recently in multimedia applications. To guarantee good performance, it is essential for STR to obtain aligned character-wise features from the whole-image feature maps. While most present works adopt fully data-driven attention-based alignment, such practice ignores specific character geometric information. In this article, built upon a group of learnable geometric points, we propose a novel shape-driven attention alignment method that is able to obtain character-wise features. Concretely, we first design a corner detector to generate a shape map to guide the attention alignments explicitly, where a series of points can be learned to represent character-wise features flexibly. We then propose a dual-path network with a mutual learning and cooperating strategy that successfully combines CNN with a ViT-based model, leading to further accuracy improvement. We conduct extensive experiments to evaluate the proposed method on various scene text benchmarks, including six popular regular and irregular datasets, two more challenging datasets (i.e., WordArt and OST), and three Chinese datasets. Experimental results indicate that our method can achieve superior performance with a comparable model size against many state-of-the-art models.
引用
收藏
页数:20
相关论文
共 50 条
  • [31] A Two-Level Rectification Attention Network for Scene Text Recognition
    Wu, Lintai
    Xu, Yong
    Hou, Junhui
    Chen, C. L. Philip
    Liu, Cheng-Lin
    IEEE TRANSACTIONS ON MULTIMEDIA, 2023, 25 : 2404 - 2414
  • [32] MEAN: Multi-Element Attention Network for Scene Text Recognition
    Yan, Ruijie
    Peng, Liangrui
    Xiao, Shanyu
    Yao, Gang
    Min, Jaesik
    2020 25TH INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION (ICPR), 2021, : 6850 - 6857
  • [33] DDaNet: Dual-Path Depth-Aware Attention Network for Fingerspelling Recognition Using RGB-D Images
    Yang, Shih-Hung
    Chen, Wei-Ren
    Huang, Wun-Jhu
    Chen, Yon-Ping
    IEEE ACCESS, 2021, 9 (09): : 7306 - 7322
  • [34] Improving Scene Text Recognition with Counting-Aware Contrastive Learning and Attention Alignment
    Yang, JunJie
    Zhoul, Bo
    Zhu, Anna
    PATTERN RECOGNITION AND COMPUTER VISION, PRCV 2024, PT VII, 2025, 15037 : 129 - 142
  • [35] CTRNet++: Dual-Path Learning with Local-Global Context Modeling for Scene Text Removal
    Liu, Chongyu
    Peng, Dezhi
    Liu, Yuliang
    Jin, Lianwen
    ACM Transactions on Multimedia Computing, Communications and Applications, 2024, 21 (01)
  • [36] Visual and semantic ensemble for scene text recognition with gated dual mutual attention
    Liu, Zhiguang
    Wang, Liangwei
    Qiao, Jian
    INTERNATIONAL JOURNAL OF MULTIMEDIA INFORMATION RETRIEVAL, 2022, 11 (04) : 669 - 680
  • [37] Visual and semantic ensemble for scene text recognition with gated dual mutual attention
    Zhiguang Liu
    Liangwei Wang
    Jian Qiao
    International Journal of Multimedia Information Retrieval, 2022, 11 : 669 - 680
  • [38] DCANet: CNN model with dual-path network and improved coordinate attention for JPEG steganalysis
    Fu, Tong
    Chen, Liquan
    Gao, Yuan
    Fang, Huiyu
    MULTIMEDIA SYSTEMS, 2024, 30 (04)
  • [39] DPNet: Dual-Path Network for Real-Time Object Detection With Lightweight Attention
    Zhou, Quan
    Shi, Huimin
    Xiang, Weikang
    Kang, Bin
    Latecki, Longin Jan
    IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2024, : 1 - 15
  • [40] Enhanced Hyperspectral Image Classification Through Dual-Path Channel-Attention Network
    Wu, Keke
    Ruan, Chao
    Zhao, Jinling
    Huang, Linsheng
    JOURNAL OF THE INDIAN SOCIETY OF REMOTE SENSING, 2024, : 1125 - 1135