Lightweight Scene Text Recognition Based on Transformer

被引:1
|
作者
Luan, Xin [1 ,2 ,3 ]
Zhang, Jinwei [1 ,2 ,3 ]
Xu, Miaomiao [1 ,2 ,3 ]
Silamu, Wushouer [1 ,2 ,3 ]
Li, Yanbing [1 ,2 ,3 ]
机构
[1] Xinjiang Univ, Coll Informat Sci & Engn, 777 Huarui St, Urumqi 830017, Peoples R China
[2] Xinjiang Univ, Xinjiang Lab Multilanguage Informat Technol, 777 Huarui St, Urumqi 830017, Peoples R China
[3] Xinjiang Univ, Xinjiang Multilingual Informat Technol Res Ctr, 777 Huarui St, Urumqi 830017, Peoples R China
基金
中国国家自然科学基金;
关键词
scene text recognition; transformer; attention mechanism;
D O I
10.3390/s23094490
中图分类号
O65 [分析化学];
学科分类号
070302 ; 081704 ;
摘要
Scene text recognition (STR) has been a hot research field in computer vision, aiming to recognize text in natural scenes using computers. Currently, attention-based encoder-decoder frameworks struggle to precisely align feature regions with the target object when dealing with complex and low-quality images, a phenomenon known as attention drift. Additionally, with the rise of Transformer, the increasing size of parameters results in higher computational costs. In order to solve the above problems, based on the latest research results of Vision Transformer (ViT), we utilize an additional position-enhancement branch to alleviate attention drift and dynamically fused position information with visual information to achieve better recognition accuracy. The experimental results demonstrate that our model achieves a 3% higher average recognition accuracy on the test set compared to the baseline. Meanwhile, our model maintains the advantage of a small number of parameters and fast inference speed, achieving a good balance between accuracy, speed, and computational load.
引用
收藏
页数:14
相关论文
共 50 条
  • [1] A Transformer-Based Framework for Scene Text Recognition
    Selvam, Prabu
    Koilraj, Joseph Abraham Sundar
    Tavera Romero, Carlos Andres
    Alharbi, Meshal
    Mehbodniya, Abolfazl
    Webber, Julian L.
    Sengan, Sudhakar
    [J]. IEEE ACCESS, 2022, 10 : 100895 - 100910
  • [2] Transformer-based end-to-end scene text recognition
    Zhu, Xinghao
    Zhang, Zhi
    [J]. PROCEEDINGS OF THE 2021 IEEE 16TH CONFERENCE ON INDUSTRIAL ELECTRONICS AND APPLICATIONS (ICIEA 2021), 2021, : 1691 - 1695
  • [3] RRTrN: A lightweight and effective backbone for scene text recognition
    Zhou, Qing
    Gao, Junyu
    Yuan, Yuan
    Wang, Qi
    [J]. EXPERT SYSTEMS WITH APPLICATIONS, 2024, 243
  • [4] STR Transformer: A Cross-domain Transformer for Scene Text Recognition
    Wu, Xing
    Tang, Bin
    Zhao, Ming
    Wang, Jianjia
    Guo, Yike
    [J]. APPLIED INTELLIGENCE, 2023, 53 (03) : 3444 - 3458
  • [5] STR Transformer: A Cross-domain Transformer for Scene Text Recognition
    Xing Wu
    Bin Tang
    Ming Zhao
    Jianjia Wang
    Yike Guo
    [J]. Applied Intelligence, 2023, 53 : 3444 - 3458
  • [6] Vision Transformer for Fast and Efficient Scene Text Recognition
    Atienza, Rowel
    [J]. DOCUMENT ANALYSIS AND RECOGNITION - ICDAR 2021, PT I, 2021, 12821 : 319 - 334
  • [7] Pure Transformer with Integrated Experts for Scene Text Recognition
    Tan, Yew Lee
    Kong, Adams Wai-Kin
    Kim, Jung-Jae
    [J]. COMPUTER VISION - ECCV 2022, PT XXVIII, 2022, 13688 : 481 - 497
  • [8] Outline Generation Transformer for Bilingual Scene Text Recognition
    Ho, Jui-Teng
    Hsu, Gee-Sern
    Yanushkevich, Svetlana
    Gavrilova, Marina L.
    [J]. 2023 18TH INTERNATIONAL CONFERENCE ON MACHINE VISION AND APPLICATIONS, MVA, 2023,
  • [9] Display-Semantic Transformer for Scene Text Recognition
    Yang, Xinqi
    Silamu, Wushour
    Xu, Miaomiao
    Li, Yanbing
    [J]. SENSORS, 2023, 23 (19)
  • [10] SAM: Self Attention Mechanism for Scene Text Recognition Based on Swin Transformer
    Shuai, Xiang
    Wang, Xiao
    Wang, Wei
    Yuan, Xin
    Xu, Xin
    [J]. MULTIMEDIA MODELING (MMM 2022), PT I, 2022, 13141 : 443 - 454