Read Like Humans: Autonomous, Bidirectional and Iterative Language Modeling for Scene Text Recognition

被引:146
|
作者
Fang, Shancheng [1 ]
Xie, Hongtao [1 ]
Wang, Yuxin [1 ]
Mao, Zhendong [1 ]
Zhang, Yongdong [1 ]
机构
[1] Univ Sci & Technol China, Hefei, Anhui, Peoples R China
关键词
D O I
10.1109/CVPR46437.2021.00702
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Linguistic knowledge is of great benefit to scene text recognition. However, how to effectively model linguistic rules in end-to-end deep networks remains a research challenge. In this paper, we argue that the limited capacity of language models comes from: 1) implicitly language modeling; 2) unidirectional feature representation; and 3) language model with noise input. Correspondingly, we propose an autonomous, bidirectional and iterative ABINet for scene text recognition. Firstly, the autonomous suggests to block gradient flow between vision and language models to enforce explicitly language modeling. Secondly, a novel bidirectional doze network (BCN) as the language model is proposed based on bidirectional feature representation. Thirdly, we propose an execution manner of iterative correction for language model which can effectively alleviate the impact of noise input. Additionally, based on the ensemble of iterative predictions, we propose a self-training method which can learn from unlabeled images effectively. Extensive experiments indicate that ABINet has superiority on low-quality images and achieves state-of-the-art results on several mainstream benchmarks. Besides, the ABINet trained with ensemble self-training shows promising improvement in realizing human-level recognition.
引用
收藏
页码:7094 / 7103
页数:10
相关论文
共 36 条
  • [21] PETR: Rethinking the Capability of Transformer-Based Language Model in Scene Text Recognition
    Wang, Yuxin
    Xie, Hongtao
    Fang, Shancheng
    Xing, Mengting
    Wang, Jing
    Zhu, Shenggao
    Zhang, Yongdong
    [J]. IEEE TRANSACTIONS ON IMAGE PROCESSING, 2022, 31 : 5585 - 5598
  • [22] From Two to One: A New Scene Text Recognizer with Visual Language Modeling Network
    Wang, Yuxin
    Xie, Hongtao
    Fang, Shancheng
    Wang, Jing
    Zhu, Shenggao
    Zhang, Yongdong
    [J]. 2021 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2021), 2021, : 14174 - 14183
  • [23] Radar technical language modeling with named entity recognition and text classification
    Zaunegger, Jackson S.
    Singerman, Paul G.
    Narayanan, Ram M.
    O'Rourke, Sean M.
    Rangaswamy, Muralidhar
    [J]. RADAR SENSOR TECHNOLOGY XXVI, 2022, 12108
  • [24] Image as a Language: Revisiting Scene Text Recognition via Balanced, Unified and Synchronized Vision-Language Reasoning Network
    Wei, Jiajun
    Zhan, Hongjian
    Lu, Yue
    Tu, Xiao
    Yin, Bing
    Liu, Cong
    Pal, Umapada
    [J]. THIRTY-EIGHTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 38 NO 6, 2024, : 5885 - 5893
  • [25] ESTIMATES OF THE OPERATING TIME OF STABLE ITERATIVE, LANGUAGE-MODELING, AND RECOGNITION SYSTEMS
    TSIVLIN, YV
    [J]. CYBERNETICS, 1987, 23 (03): : 351 - 361
  • [26] Scene text recognition via context modeling for low-quality image in logistics industry
    Heng, Herui
    Li, Peiji
    Guan, Tuxin
    Yang, Tianyu
    [J]. COMPLEX & INTELLIGENT SYSTEMS, 2023, 9 (03) : 3229 - 3248
  • [27] Scene text recognition via context modeling for low-quality image in logistics industry
    Herui Heng
    Peiji Li
    Tuxin Guan
    Tianyu Yang
    [J]. Complex & Intelligent Systems, 2023, 9 : 3229 - 3248
  • [28] Collaborative Encoding Method for Scene Text Recognition in Low Linguistic Resources: The Uyghur Language Case Study
    Xu, Miaomiao
    Zhang, Jiang
    Xu, Lianghui
    Silamu, Wushour
    Li, Yanbing
    [J]. APPLIED SCIENCES-BASEL, 2024, 14 (05):
  • [29] Optimization integrated generative adversarial network for occluded text recognition with language modeling
    Selvaraj, Selvin Ebenezer
    Tripuraribhatla, Raghuveera
    [J]. CONCURRENCY AND COMPUTATION-PRACTICE & EXPERIENCE, 2023, 35 (08):
  • [30] Scene text recognition via dual character counting-aware visual and semantic modeling network
    Ke Xiao
    Anna Zhu
    Brian Kenji Iwana
    Cheng-Lin Liu
    [J]. Science China Information Sciences, 2024, 67