Perceiving Multiple Representations for scene text image super-resolution guided by text recognizer

被引:3
|
作者
Shi, Qin [1 ,4 ]
Zhu, Yu [1 ]
Liu, Yatong [1 ]
Ye, Jiongyao [1 ]
Yang, Dawei [2 ,3 ,4 ]
机构
[1] East China Univ Sci & Technol, Sch Informat Sci & Engn, Shanghai 200237, Peoples R China
[2] Fudan Univ, Zhongshan Hosp, Dept Pulm & Crit Care Med, Shanghai 200032, Peoples R China
[3] Fudan Univ, Zhongshan Hosp Xiamen, Dept Pulm & Crit Care Med, Shanghai 361015, Peoples R China
[4] Shanghai Engn Res Ctr Internet Things Resp Med, Shanghai 200032, Peoples R China
基金
中国国家自然科学基金;
关键词
Scene text image super-resolution; Scene text recognition; Contextual information; Visual features; Frequency domain learning; NEURAL-NETWORK;
D O I
10.1016/j.engappai.2023.106551
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Single image super-resolution (SISR) aims to recover clear high-resolution images from low-resolution images, which has made great progress with the development of deep learning these years. Scene text image super -resolution (STISR) is a subfield of SISR with the goal of increasing the resolution of a low-resolution text image and enhancing the readability of characters in the image. Despite significant improvements in recent approaches, STISR remains a challenging task due to the diversity of background, text appearances and layouts, etc. This paper presents a Perceiving Multiple Representations (PerMR) method for better super -resolution performances in scene text images. PerMR is a unified network that combines super-resolution with text recognition and exploits the recognizer's feedback to facilitate super-resolution. Specifically, contextual information from the text decoder is extracted to provide sequence-specific guidance and enable the super -resolution model to pay more attention to the text region. Meanwhile, low-level and high-level visual features from the vision backbone of the recognition network are integrated to further improve visual quality. Additionally, we incorporate a frequency branch into the vanilla convolution unit, which efficiently enhances global and local feature representations. Experiments on the STISR benchmark dataset TextZoom validate that PerMR can not only generate more distinguishable images, but also outperforms the current state-of-the-art methods. PerMR boosts the average recognition accuracy by 5.9% using ASTER, 5.8% using MORAN and 10.6% using CRNN compared to the baseline model TSRN. PerMR outperforms the advanced method TPGSR-3 by 1.4% on ASTER, 0.1% on MORAN, 0.2% on CRNN and boosts TATT by 0.6% on ASTER and 1.1% on MORAN respectively. Furthermore, PerMR demonstrates good robustness and generalization when tackling low-quality text images in multiple scene text recognition datasets. The experiment results verify the capabilities of PerMR to boost text recognition performance.
引用
收藏
页数:13
相关论文
共 50 条
  • [41] TextSRNet: Scene Text Super-Resolution Based on Contour Prior and Atrous Convolution
    Ma, Jizhao
    Jin, Lianwen
    Zhang, Jiaxin
    Jiang, Jiajia
    Xue, Yang
    He, Mengchao
    2022 26TH INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION (ICPR), 2022, : 3252 - 3258
  • [42] Multiple Learned Dictionaries based Clustered Sparse Coding for the Super-Resolution of Single Text Image
    Walha, Rim
    Drira, Fadoua
    Lebourgeois, Franck
    Garcia, Christophe
    Alimi, Adel M.
    2013 12TH INTERNATIONAL CONFERENCE ON DOCUMENT ANALYSIS AND RECOGNITION (ICDAR), 2013, : 484 - 488
  • [43] Pixel Adapter: A Graph-Based Post-Processing Approach for Scene Text Image Super-Resolution
    Zhang, Wenyu
    Deng, Xin
    Jia, Baojun
    Yu, Xingtong
    Chen, Yifan
    Ma, Jin
    Ding, Qing
    Zhang, Xinming
    PROCEEDINGS OF THE 31ST ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2023, 2023, : 2168 - 2179
  • [44] Super-Resolution of Text Image Based on Conditional Generative Adversarial Network
    Wang, Yuyang
    Ding, Wenjun
    Su, Feng
    ADVANCES IN MULTIMEDIA INFORMATION PROCESSING, PT III, 2018, 11166 : 270 - 281
  • [45] Learning Generative Structure Prior for Blind Text Image Super-resolution
    Li, Xiaoming
    Zuo, Wangmeng
    Loy, Chen Change
    2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2023, : 10103 - 10113
  • [46] Pixel-Level Degradation for Text Image Super-Resolution and Recognition
    Qian, Xiaohong
    Xie, Lifeng
    Ye, Ning
    Le, Renlong
    Yang, Shengying
    ELECTRONICS, 2023, 12 (21)
  • [47] CNN-Based Text Image Super-Resolution Tailored for OCR
    Zhang, Haochen
    Liu, Dong
    Xiong, Zhiwei
    2017 IEEE VISUAL COMMUNICATIONS AND IMAGE PROCESSING (VCIP), 2017,
  • [48] Coarse-to-fine text injecting for realistic image super-resolution
    Chen, Xiaoyu
    Bai, Chao
    Wu, Zhenyao
    Wu, Xinyi
    Zou, Qi
    Xia, Yong
    Wang, Song
    NEUROCOMPUTING, 2025, 626
  • [49] Scene text image super-resolution using multi-scale convolutional neural network with skip connections
    Walha, Rim
    Aouini, Amal
    APPLIED INTELLIGENCE, 2024, : 5931 - 5943
  • [50] Pixel Adapter: A Graph-Based Post-Processing Approach for Scene Text Image Super-Resolution
    Zhang, Wenyu
    Deng, Xin
    Jia, Baojun
    Yu, Xingtong
    Chen, Yifan
    Ma, Jin
    Ding, Qing
    Zhang, Xinming
    MM 2023 - Proceedings of the 31st ACM International Conference on Multimedia, 2023, : 2168 - 2179