Multi-granularity Prediction for Scene Text Recognition

被引:40
|
作者
Wang, Peng [1 ]
Da, Cheng [1 ]
Yao, Cong [1 ]
机构
[1] Alibaba DAMO Acad, Beijing, Peoples R China
来源
关键词
Scene text recognition; ViT; Multi-granularity prediction; EFFICIENT;
D O I
10.1007/978-3-031-19815-1_20
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Scene text recognition (STR) has been an active research topic in computer vision for years. To tackle this challenging problem, numerous innovative methods have been successively proposed and incorporating linguistic knowledge into STR models has recently become a prominent trend. In this work, we first draw inspiration from the recent progress in Vision Transformer (ViT) to construct a conceptually simple yet powerful vision STR model, which is built upon ViT and outperforms previous state-of-the-art models for scene text recognition, including both pure vision models and language-augmented methods. To integrate linguistic knowledge, we further propose a Multi-Granularity Prediction strategy to inject information from the language modality into the model in an implicit way, i.e., subword representations (BPE and WordPiece) widely-used in NLP are introduced into the output space, in addition to the conventional character level representation, while no independent language model (LM) is adopted. The resultant algorithm (termed MGP-STR) is able to push the performance envelop of STR to an even higher level. Specifically, it achieves an average recognition accuracy of 93.35% on standard benchmarks.
引用
收藏
页码:339 / 355
页数:17
相关论文
共 50 条
  • [41] Positive unlabeled named entity recognition with multi-granularity linguistic information
    Ouyang X.
    Chen S.
    Wang R.
    High Technology Letters, 2021, 27 (04) : 373 - 380
  • [42] Recognition of multi-granularity linguistic and decision attribute based on cloud map
    Sun, Guidong
    Guan, Xin
    Yi, Xiao
    Wang, Hong
    Hangkong Xuebao/Acta Aeronautica et Astronautica Sinica, 2015, 36 (10): : 3349 - 3358
  • [43] Multi-Granularity Representations of Dialog
    Mehri, Shikib
    Eskenazi, Maxine
    2019 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING AND THE 9TH INTERNATIONAL JOINT CONFERENCE ON NATURAL LANGUAGE PROCESSING (EMNLP-IJCNLP 2019): PROCEEDINGS OF THE CONFERENCE, 2019, : 1752 - 1761
  • [44] A multi-granularity genetic algorithm
    Li, Caoxiao
    Xia, Shuyin
    Chen, Zizhong
    Wang, Guoyin
    2019 10TH IEEE INTERNATIONAL CONFERENCE ON BIG KNOWLEDGE (ICBK 2019), 2019, : 135 - 141
  • [45] Tracing content requirements in financial documents using multi-granularity text analysis
    Li, Xiaochen
    Bianculli, Domenico
    Briand, Lionel
    REQUIREMENTS ENGINEERING, 2025, : 109 - 132
  • [46] Multi-Granularity Graph Convolution Network for Major Depressive Disorder Recognition
    Sun, Xiaofang
    Xu, Yonghui
    Zhao, Yibowen
    Zheng, Xiangwei
    Cui, Lizhen
    Zheng, Yongqing
    IEEE TRANSACTIONS ON NEURAL SYSTEMS AND REHABILITATION ENGINEERING, 2024, 32 : 559 - 569
  • [47] A Biomedical Named Entity Recognition Framework with Multi-granularity Prompt Tuning
    Liu, Zhuoya
    Chi, Tang
    Zhang, Peiliang
    Wu, Xiaoting
    Che, Chao
    HEALTH INFORMATION PROCESSING, CHIP 2022, 2023, 1772 : 95 - 105
  • [48] Text-based Person Search via Multi-Granularity Embedding Learning
    Wang, Chengji
    Luo, Zhiming
    Lin, Yaojin
    Li, Shaozi
    PROCEEDINGS OF THE THIRTIETH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, IJCAI 2021, 2021, : 1068 - 1074
  • [49] Positive unlabeled named entity recognition with multi-granularity linguistic information
    欧阳小叶
    Chen Shudong
    Wang Rong
    High Technology Letters, 2021, 27 (04) : 373 - 380
  • [50] Learning multi-granularity features from multi-granularity regions for person re-identification
    Yang, Kaiwen
    Yang, Jiwei
    Tian, Xinmei
    NEUROCOMPUTING, 2021, 432 : 206 - 215