Read Like Humans: Autonomous, Bidirectional and Iterative Language Modeling for Scene Text Recognition

被引:146
|
作者
Fang, Shancheng [1 ]
Xie, Hongtao [1 ]
Wang, Yuxin [1 ]
Mao, Zhendong [1 ]
Zhang, Yongdong [1 ]
机构
[1] Univ Sci & Technol China, Hefei, Anhui, Peoples R China
关键词
D O I
10.1109/CVPR46437.2021.00702
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Linguistic knowledge is of great benefit to scene text recognition. However, how to effectively model linguistic rules in end-to-end deep networks remains a research challenge. In this paper, we argue that the limited capacity of language models comes from: 1) implicitly language modeling; 2) unidirectional feature representation; and 3) language model with noise input. Correspondingly, we propose an autonomous, bidirectional and iterative ABINet for scene text recognition. Firstly, the autonomous suggests to block gradient flow between vision and language models to enforce explicitly language modeling. Secondly, a novel bidirectional doze network (BCN) as the language model is proposed based on bidirectional feature representation. Thirdly, we propose an execution manner of iterative correction for language model which can effectively alleviate the impact of noise input. Additionally, based on the ensemble of iterative predictions, we propose a self-training method which can learn from unlabeled images effectively. Extensive experiments indicate that ABINet has superiority on low-quality images and achieves state-of-the-art results on several mainstream benchmarks. Besides, the ABINet trained with ensemble self-training shows promising improvement in realizing human-level recognition.
引用
收藏
页码:7094 / 7103
页数:10
相关论文
共 36 条
  • [1] ABINet plus plus : Autonomous, Bidirectional and Iterative Language Modeling for Scene Text Spotting
    Fang, Shancheng
    Mao, Zhendong
    Xie, Hongtao
    Wang, Yuxin
    Yan, Chenggang
    Zhang, Yongdong
    [J]. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2023, 45 (06) : 7123 - 7141
  • [2] Scene text recognition with context-aware autonomous bidirectional iterative models
    Zhao, Xiaoqing
    Xu, Miaomiao
    Li, Yanbing
    Huang, Hao
    Silamu, Wushour
    [J]. Journal of Intelligent and Fuzzy Systems, 2024, 46 (04): : 8605 - 8616
  • [3] DISTILLING KNOWLEDGE OF BIDIRECTIONAL LANGUAGE MODEL FOR SCENE TEXT RECOGNITION
    Orihashi, Shota
    Yamazaki, Yoshihiro
    Uchida, Mihiro
    Takashima, Akihiko
    Masumura, Ryo
    [J]. 2023 IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING, ICIP, 2023, : 2165 - 2169
  • [4] IterVM: Iterative Vision Modeling Module for Scene Text Recognition
    Chu, Xiaojie
    Wang, Yongtao
    [J]. 2022 26TH INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION (ICPR), 2022, : 1393 - 1399
  • [5] Bidirectional Scene Text Recognition with a Single Decoder
    Bleeker, Maurits
    de Rijke, Maarten
    [J]. ECAI 2020: 24TH EUROPEAN CONFERENCE ON ARTIFICIAL INTELLIGENCE, 2020, 325 : 2664 - 2671
  • [6] Attention and Language Ensemble for Scene Text Recognition with Convolutional Sequence Modeling
    Fang, Shancheng
    Xie, Hongtao
    Zha, Zheng-Jun
    Sun, Nannan
    Tan, Jianlong
    Zhang, Yongdong
    [J]. PROCEEDINGS OF THE 2018 ACM MULTIMEDIA CONFERENCE (MM'18), 2018, : 248 - 256
  • [7] Bidirectional extraction and recognition of scene text with layout consistency
    Ryota Hinami
    Xinhao Liu
    Naoki Chiba
    Shin’ichi Satoh
    [J]. International Journal on Document Analysis and Recognition (IJDAR), 2016, 19 : 83 - 98
  • [8] Bidirectional extraction and recognition of scene text with layout consistency
    Hinami, Ryota
    Liu, Xinhao
    Chiba, Naoki
    Satoh, Shin'ichi
    [J]. INTERNATIONAL JOURNAL ON DOCUMENT ANALYSIS AND RECOGNITION, 2016, 19 (02) : 83 - 98
  • [9] Scene Text Detection and Recognition Based on Iterative Correction
    Xiong, Li
    Gui, Ziyan
    Ou, Ying
    Xu, Wenxia
    [J]. PROCEEDINGS OF 2022 5TH INTERNATIONAL CONFERENCE ON ROBOT SYSTEMS AND APPLICATIONS, ICRSA2022, 2022, : 7 - 10
  • [10] PIMNet: A Parallel, Iterative and Mimicking Network for Scene Text Recognition
    Qiao, Zhi
    Zhou, Yu
    Wei, Jin
    Wang, Wei
    Zhang, Yuan
    Jiang, Ning
    Wang, Hongbin
    Wang, Weiping
    [J]. PROCEEDINGS OF THE 29TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2021, 2021, : 2046 - 2055