IterVM: Iterative Vision Modeling Module for Scene Text Recognition

被引:2
|
作者
Chu, Xiaojie [1 ]
Wang, Yongtao [1 ]
机构
[1] Peking Univ, Wangxuan Inst Comp Technol, Beijing, Peoples R China
关键词
NETWORK;
D O I
10.1109/ICPR56361.2022.9956029
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Scene text recognition (STR) is a challenging problem due to the imperfect imagery conditions in natural images. State-of-the-art methods utilize both visual cues and linguistic knowledge to tackle this challenging problem. Specifically, they propose iterative language modeling module (IterLM) to repeatedly refine the output sequence from the visual modeling module (VM). Though achieving promising results, the vision modeling module has become the performance bottleneck of these methods. In this paper, we newly propose iterative vision modeling module (IterVM) to further improve the STR accuracy. Specifically, the first VM directly extracts multi-level features from the input image, and the following VMs re-extract multi-level features from the input image and fuse them with the high-level (i.e., the most semantic one) feature extracted by the previous VM. By combining the proposed IterVM with iterative language modeling module, we further propose a powerful scene text recognizer called IterNet. Extensive experiments demonstrate that the proposed IterVM can significantly improve the scene text recognition accuracy, especially on low-quality scene text images. Moreover, the proposed scene text recognizer IterNet achieves new state-of-the-art results on several public benchmarks. Codes and models are available at https://github.com/VDIGPKU/IterNet.
引用
收藏
页码:1393 / 1399
页数:7
相关论文
共 50 条
  • [1] Read Like Humans: Autonomous, Bidirectional and Iterative Language Modeling for Scene Text Recognition
    Fang, Shancheng
    Xie, Hongtao
    Wang, Yuxin
    Mao, Zhendong
    Zhang, Yongdong
    [J]. 2021 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR 2021, 2021, : 7094 - 7103
  • [2] Scene Text Detection and Recognition Based on Iterative Correction
    Xiong, Li
    Gui, Ziyan
    Ou, Ying
    Xu, Wenxia
    [J]. PROCEEDINGS OF 2022 5TH INTERNATIONAL CONFERENCE ON ROBOT SYSTEMS AND APPLICATIONS, ICRSA2022, 2022, : 7 - 10
  • [3] Vision Transformer for Fast and Efficient Scene Text Recognition
    Atienza, Rowel
    [J]. DOCUMENT ANALYSIS AND RECOGNITION - ICDAR 2021, PT I, 2021, 12821 : 319 - 334
  • [4] ANALYSIS OF THE NOVEL TRANSFORMER MODULE COMBINATION FOR SCENE TEXT RECOGNITION
    Kim, Yeon-Gyu
    Kim, Hyunsu
    Kang, Minseok
    Lee, Hyug-Jae
    Lee, Rokkyu
    Park, Gunhan
    [J]. 2021 IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING (ICIP), 2021, : 1229 - 1233
  • [5] PIMNet: A Parallel, Iterative and Mimicking Network for Scene Text Recognition
    Qiao, Zhi
    Zhou, Yu
    Wei, Jin
    Wang, Wei
    Zhang, Yuan
    Jiang, Ning
    Wang, Hongbin
    Wang, Weiping
    [J]. PROCEEDINGS OF THE 29TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2021, 2021, : 2046 - 2055
  • [6] Scene text recognition with context-aware autonomous bidirectional iterative models
    Zhao, Xiaoqing
    Xu, Miaomiao
    Li, Yanbing
    Huang, Hao
    Silamu, Wushour
    [J]. Journal of Intelligent and Fuzzy Systems, 2024, 46 (04): : 8605 - 8616
  • [7] IFR: Iterative Fusion Based Recognizer for Low Quality Scene Text Recognition
    Jia, Zhiwei
    Xu, Shugong
    Mu, Shiyi
    Tao, Yue
    Cao, Shan
    Chen, Zhiyong
    [J]. PATTERN RECOGNITION AND COMPUTER VISION, PRCV 2021, PT II, 2021, 13020 : 180 - 191
  • [8] Central and peripheral vision for scene recognition: A neurocomputational modeling exploration
    Wang, Panqu
    Cottrell, Garrison W.
    [J]. JOURNAL OF VISION, 2017, 17 (04): : 1 - 22
  • [9] ESIR: End-to-end Scene Text Recognition via Iterative Image Rectification
    Zhan, Fangneng
    Lu, Shijian
    [J]. 2019 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2019), 2019, : 2054 - 2063
  • [10] FULLY SHAREABLE SCENE TEXT RECOGNITION MODELING FOR HORIZONTAL AND VERTICAL WRITING
    Orihashi, Shota
    Yamazaki, Yoshihiro
    Uchida, Mihiro
    Takashima, Akihiko
    Masumura, Ryo
    [J]. 2022 IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING, ICIP, 2022, : 2636 - 2640