Perceiving Multiple Representations for scene text image super-resolution guided by text recognizer

被引:3
|
作者
Shi, Qin [1 ,4 ]
Zhu, Yu [1 ]
Liu, Yatong [1 ]
Ye, Jiongyao [1 ]
Yang, Dawei [2 ,3 ,4 ]
机构
[1] East China Univ Sci & Technol, Sch Informat Sci & Engn, Shanghai 200237, Peoples R China
[2] Fudan Univ, Zhongshan Hosp, Dept Pulm & Crit Care Med, Shanghai 200032, Peoples R China
[3] Fudan Univ, Zhongshan Hosp Xiamen, Dept Pulm & Crit Care Med, Shanghai 361015, Peoples R China
[4] Shanghai Engn Res Ctr Internet Things Resp Med, Shanghai 200032, Peoples R China
基金
中国国家自然科学基金;
关键词
Scene text image super-resolution; Scene text recognition; Contextual information; Visual features; Frequency domain learning; NEURAL-NETWORK;
D O I
10.1016/j.engappai.2023.106551
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Single image super-resolution (SISR) aims to recover clear high-resolution images from low-resolution images, which has made great progress with the development of deep learning these years. Scene text image super -resolution (STISR) is a subfield of SISR with the goal of increasing the resolution of a low-resolution text image and enhancing the readability of characters in the image. Despite significant improvements in recent approaches, STISR remains a challenging task due to the diversity of background, text appearances and layouts, etc. This paper presents a Perceiving Multiple Representations (PerMR) method for better super -resolution performances in scene text images. PerMR is a unified network that combines super-resolution with text recognition and exploits the recognizer's feedback to facilitate super-resolution. Specifically, contextual information from the text decoder is extracted to provide sequence-specific guidance and enable the super -resolution model to pay more attention to the text region. Meanwhile, low-level and high-level visual features from the vision backbone of the recognition network are integrated to further improve visual quality. Additionally, we incorporate a frequency branch into the vanilla convolution unit, which efficiently enhances global and local feature representations. Experiments on the STISR benchmark dataset TextZoom validate that PerMR can not only generate more distinguishable images, but also outperforms the current state-of-the-art methods. PerMR boosts the average recognition accuracy by 5.9% using ASTER, 5.8% using MORAN and 10.6% using CRNN compared to the baseline model TSRN. PerMR outperforms the advanced method TPGSR-3 by 1.4% on ASTER, 0.1% on MORAN, 0.2% on CRNN and boosts TATT by 0.6% on ASTER and 1.1% on MORAN respectively. Furthermore, PerMR demonstrates good robustness and generalization when tackling low-quality text images in multiple scene text recognition datasets. The experiment results verify the capabilities of PerMR to boost text recognition performance.
引用
收藏
页数:13
相关论文
共 50 条
  • [1] Text Prior Guided Scene Text Image Super-Resolution
    Ma, Jianqi
    Guo, Shi
    Zhang, Lei
    IEEE TRANSACTIONS ON IMAGE PROCESSING, 2023, 32 : 1341 - 1353
  • [2] Scene Text Telescope: Text-Focused Scene Image Super-Resolution
    Chen, Jingye
    Li, Bin
    Xue, Xiangyang
    2021 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR 2021, 2021, : 12021 - 12030
  • [3] GARDEN: Generative Prior Guided Network for Scene Text Image Super-Resolution
    Kong, Yuxin
    Ma, Weihong
    Jin, Lianwen
    Xue, Yang
    DOCUMENT ANALYSIS AND RECOGNITION-ICDAR 2024, PT V, 2024, 14808 : 196 - 214
  • [4] Text Image Super-Resolution Guided by Text Structure and Embedding Priors
    Huang, Cong
    Peng, Xiulian
    Liu, Dong
    Lu, Yan
    ACM TRANSACTIONS ON MULTIMEDIA COMPUTING COMMUNICATIONS AND APPLICATIONS, 2023, 19 (06)
  • [5] Text Gestalt: Stroke-Aware Scene Text Image Super-resolution
    Chen, Jingye
    Yu, Haiyang
    Ma, Jianqi
    Li, Bin
    Xue, Xiangyang
    THIRTY-SIXTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE / THIRTY-FOURTH CONFERENCE ON INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE / THE TWELVETH SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2022, : 285 - 293
  • [6] Scene Text Image Super-Resolution Reconstruction Based on Perceiving Multi-Domain Character Distance
    Huang, Jun-Yang
    Chen, Hong-Hui
    Wang, Jia-Bao
    Chen, Ping-Ping
    Lin, Zhi-Jian
    Tien Tzu Hsueh Pao/Acta Electronica Sinica, 2024, 52 (07): : 2262 - 2270
  • [7] Batch-transformer for scene text image super-resolution
    Sun, Yaqi
    Xie, Xiaolan
    Li, Zhi
    Yang, Kai
    VISUAL COMPUTER, 2024, 40 (10): : 7399 - 7409
  • [8] A Text Attention Network for Spatial Deformation Robust Scene Text Image Super-resolution
    Ma, Jianqi
    Liang, Zhetong
    Zhang, Lei
    2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2022), 2022, : 5901 - 5910
  • [9] Scene Text Image Super-Resolution Via Semantic Distillation and Text Perceptual Loss
    Zhao, Cairong
    Shu, Rui
    Feng, Shuyang
    Zhu, Liang
    Wang, Xuekuan
    IEEE TRANSACTIONS ON MULTIMEDIA, 2025, 27 : 1153 - 1164
  • [10] Multi-Task Learning for Scene Text Image Super-Resolution with Multiple Transformers
    Honda, Kosuke
    Kurematsu, Masaki
    Fujita, Hamido
    Selamat, Ali
    ELECTRONICS, 2022, 11 (22)