Read Like Humans: Autonomous, Bidirectional and Iterative Language Modeling for Scene Text Recognition

被引：146

作者：

Fang, Shancheng ^{[1
]}

Xie, Hongtao ^{[1
]}

Wang, Yuxin ^{[1
]}

Mao, Zhendong ^{[1
]}

Zhang, Yongdong ^{[1
]}

机构：

[1] Univ Sci & Technol China, Hefei, Anhui, Peoples R China

来源：

2021 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR 2021 | 2021年

关键词：

D O I：

10.1109/CVPR46437.2021.00702

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Linguistic knowledge is of great benefit to scene text recognition. However, how to effectively model linguistic rules in end-to-end deep networks remains a research challenge. In this paper, we argue that the limited capacity of language models comes from: 1) implicitly language modeling; 2) unidirectional feature representation; and 3) language model with noise input. Correspondingly, we propose an autonomous, bidirectional and iterative ABINet for scene text recognition. Firstly, the autonomous suggests to block gradient flow between vision and language models to enforce explicitly language modeling. Secondly, a novel bidirectional doze network (BCN) as the language model is proposed based on bidirectional feature representation. Thirdly, we propose an execution manner of iterative correction for language model which can effectively alleviate the impact of noise input. Additionally, based on the ensemble of iterative predictions, we propose a self-training method which can learn from unlabeled images effectively. Extensive experiments indicate that ABINet has superiority on low-quality images and achieves state-of-the-art results on several mainstream benchmarks. Besides, the ABINet trained with ensemble self-training shows promising improvement in realizing human-level recognition.

引用

页码：7094 / 7103

页数：10

共 36 条

[1] ABINet plus plus : Autonomous, Bidirectional and Iterative Language Modeling for Scene Text Spotting
Fang, Shancheng
Mao, Zhendong
Xie, Hongtao
Wang, Yuxin
Yan, Chenggang
Zhang, Yongdong
[J]. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2023, 45 (06) : 7123 - 7141
[2] Scene text recognition with context-aware autonomous bidirectional iterative models
Zhao, Xiaoqing
Xu, Miaomiao
Li, Yanbing
Huang, Hao
Silamu, Wushour
[J]. Journal of Intelligent and Fuzzy Systems, 2024, 46 (04): : 8605 - 8616
[3] DISTILLING KNOWLEDGE OF BIDIRECTIONAL LANGUAGE MODEL FOR SCENE TEXT RECOGNITION
Orihashi, Shota
Yamazaki, Yoshihiro
Uchida, Mihiro
Takashima, Akihiko
Masumura, Ryo
[J]. 2023 IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING, ICIP, 2023, : 2165 - 2169
[4] IterVM: Iterative Vision Modeling Module for Scene Text Recognition
Chu, Xiaojie
Wang, Yongtao
[J]. 2022 26TH INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION (ICPR), 2022, : 1393 - 1399
[5] Bidirectional Scene Text Recognition with a Single Decoder
Bleeker, Maurits
de Rijke, Maarten
[J]. ECAI 2020: 24TH EUROPEAN CONFERENCE ON ARTIFICIAL INTELLIGENCE, 2020, 325 : 2664 - 2671
[6] Attention and Language Ensemble for Scene Text Recognition with Convolutional Sequence Modeling
Fang, Shancheng
Xie, Hongtao
Zha, Zheng-Jun
Sun, Nannan
Tan, Jianlong
Zhang, Yongdong
[J]. PROCEEDINGS OF THE 2018 ACM MULTIMEDIA CONFERENCE (MM'18), 2018, : 248 - 256
[7] Bidirectional extraction and recognition of scene text with layout consistency
Ryota Hinami
Xinhao Liu
Naoki Chiba
Shin’ichi Satoh
[J]. International Journal on Document Analysis and Recognition (IJDAR), 2016, 19 : 83 - 98
[8] Bidirectional extraction and recognition of scene text with layout consistency
Hinami, Ryota
Liu, Xinhao
Chiba, Naoki
Satoh, Shin'ichi
[J]. INTERNATIONAL JOURNAL ON DOCUMENT ANALYSIS AND RECOGNITION, 2016, 19 (02) : 83 - 98
[9] Scene Text Detection and Recognition Based on Iterative Correction
Xiong, Li
Gui, Ziyan
Ou, Ying
Xu, Wenxia
[J]. PROCEEDINGS OF 2022 5TH INTERNATIONAL CONFERENCE ON ROBOT SYSTEMS AND APPLICATIONS, ICRSA2022, 2022, : 7 - 10
[10] PIMNet: A Parallel, Iterative and Mimicking Network for Scene Text Recognition
Qiao, Zhi
Zhou, Yu
Wei, Jin
Wang, Wei
Zhang, Yuan
Jiang, Ning
Wang, Hongbin
Wang, Weiping
[J]. PROCEEDINGS OF THE 29TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2021, 2021, : 2046 - 2055

← 1 2 3 4 →