Perceiving Multiple Representations for scene text image super-resolution guided by text recognizer

被引:3
|
作者
Shi, Qin [1 ,4 ]
Zhu, Yu [1 ]
Liu, Yatong [1 ]
Ye, Jiongyao [1 ]
Yang, Dawei [2 ,3 ,4 ]
机构
[1] East China Univ Sci & Technol, Sch Informat Sci & Engn, Shanghai 200237, Peoples R China
[2] Fudan Univ, Zhongshan Hosp, Dept Pulm & Crit Care Med, Shanghai 200032, Peoples R China
[3] Fudan Univ, Zhongshan Hosp Xiamen, Dept Pulm & Crit Care Med, Shanghai 361015, Peoples R China
[4] Shanghai Engn Res Ctr Internet Things Resp Med, Shanghai 200032, Peoples R China
基金
中国国家自然科学基金;
关键词
Scene text image super-resolution; Scene text recognition; Contextual information; Visual features; Frequency domain learning; NEURAL-NETWORK;
D O I
10.1016/j.engappai.2023.106551
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Single image super-resolution (SISR) aims to recover clear high-resolution images from low-resolution images, which has made great progress with the development of deep learning these years. Scene text image super -resolution (STISR) is a subfield of SISR with the goal of increasing the resolution of a low-resolution text image and enhancing the readability of characters in the image. Despite significant improvements in recent approaches, STISR remains a challenging task due to the diversity of background, text appearances and layouts, etc. This paper presents a Perceiving Multiple Representations (PerMR) method for better super -resolution performances in scene text images. PerMR is a unified network that combines super-resolution with text recognition and exploits the recognizer's feedback to facilitate super-resolution. Specifically, contextual information from the text decoder is extracted to provide sequence-specific guidance and enable the super -resolution model to pay more attention to the text region. Meanwhile, low-level and high-level visual features from the vision backbone of the recognition network are integrated to further improve visual quality. Additionally, we incorporate a frequency branch into the vanilla convolution unit, which efficiently enhances global and local feature representations. Experiments on the STISR benchmark dataset TextZoom validate that PerMR can not only generate more distinguishable images, but also outperforms the current state-of-the-art methods. PerMR boosts the average recognition accuracy by 5.9% using ASTER, 5.8% using MORAN and 10.6% using CRNN compared to the baseline model TSRN. PerMR outperforms the advanced method TPGSR-3 by 1.4% on ASTER, 0.1% on MORAN, 0.2% on CRNN and boosts TATT by 0.6% on ASTER and 1.1% on MORAN respectively. Furthermore, PerMR demonstrates good robustness and generalization when tackling low-quality text images in multiple scene text recognition datasets. The experiment results verify the capabilities of PerMR to boost text recognition performance.
引用
收藏
页数:13
相关论文
共 50 条
  • [31] Rethinking Super-Resolution as Text-Guided Details Generation
    Ma, Chenxi
    Yan, Bo
    Lin, Qing
    Tan, Weimin
    Chen, Siming
    PROCEEDINGS OF THE 30TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2022, 2022, : 3461 - 3469
  • [32] Mask Scene Text Recognizer
    Shi, Haodong
    Peng, Liangrui
    Yan, Ruijie
    Yao, Gang
    Han, Shuman
    Wang, Shengjin
    DOCUMENT ANALYSIS AND RECOGNITION, ICDAR 2021, PT IV, 2021, 12824 : 33 - 48
  • [33] Parametric loss-based super-resolution for scene text recognition
    Supatta Viriyavisuthisakul
    Parinya Sanguansat
    Teeradaj Racharak
    Minh Le Nguyen
    Natsuda Kaothanthong
    Choochart Haruechaiyasak
    Toshihiko Yamasaki
    Machine Vision and Applications, 2023, 34
  • [34] Parametric loss-based super-resolution for scene text recognition
    Viriyavisuthisakul, Supatta
    Sanguansat, Parinya
    Racharak, Teeradaj
    Le Nguyen, Minh
    Kaothanthong, Natsuda
    Haruechaiyasak, Choochart
    Yamasaki, Toshihiko
    MACHINE VISION AND APPLICATIONS, 2023, 34 (04)
  • [35] Image and Text: Fighting the Same Battle? Super-resolution Learning for Imbalanced Text Classification
    Meunier, Romain
    Benamar, Farah
    Moriceau, Veronique
    Stolfl, Patricia
    FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (EMNLP 2023), 2023, : 10707 - 10720
  • [36] ICDAR2015 Competition on Text Image Super-Resolution
    Peyrard, Clement
    Baccouche, Moez
    Mamalet, Franck
    Garcia, Christophe
    2015 13TH IAPR INTERNATIONAL CONFERENCE ON DOCUMENT ANALYSIS AND RECOGNITION (ICDAR), 2015, : 1201 - 1205
  • [37] ADVERSARIAL TEXT IMAGE SUPER-RESOLUTION USING SINKHORN DISTANCE
    Geng, Cong
    Chen, Li
    Zhang, Xiaoyun
    Gao, Zhiyong
    2020 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2020, : 2663 - 2667
  • [38] Anisotropic Total Variation Method for Text Image Super-Resolution
    Bayarsaikhan, Battulga
    Kwon, Younghee
    Kim, Jin Hyung
    PROCEEDINGS OF THE 8TH IAPR INTERNATIONAL WORKSHOP ON DOCUMENT ANALYSIS SYSTEMS, 2008, : 473 - 479
  • [39] Scene Text Image Super-Resolution Through Multi-Scale Interaction of Structural and Semantic Priors
    Zhu Z.
    Zhang L.
    Bai Y.
    Wang Y.
    Li P.
    IEEE Transactions on Artificial Intelligence, 2024, 5 (07): : 1 - 11
  • [40] Navigating Style Variations in Scene Text Image Super-Resolution through Multi-Scale Perception
    Xu, Feifei
    Yu, Ziheng
    PROCEEDINGS OF THE 4TH ANNUAL ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA RETRIEVAL, ICMR 2024, 2024, : 229 - 238