A Text Attention Network for Spatial Deformation Robust Scene Text Image Super-resolution

被引:16
|
作者
Ma, Jianqi [1 ]
Liang, Zhetong [2 ]
Zhang, Lei [1 ]
机构
[1] Hong Kong Polytech Univ, Hong Kong, Peoples R China
[2] OPPO Res, Shenzhen, Peoples R China
关键词
D O I
10.1109/CVPR52688.2022.00582
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Scene text image super-resolution aims to increase the resolution and readability of the text in low-resolution images. Though significant improvement has been achieved by deep convolutional neural networks (CNNs), it remains difficult to reconstruct high-resolution images for spatially deformed texts, especially rotated and curve-shaped ones. This is because the current CNN-based methods adopt locality-based operations, which are not effective to deal with the variation caused by deformations. In this paper, we propose a CNN based Text ATTention network (TATT) to address this problem. The semantics of the text are firstly extracted by a text recognition module as text prior information. Then we design a novel transformer-based module, which leverages global attention mechanism, to exert the semantic guidance of text prior to the text reconstruction process. In addition, we propose a text structure consistency loss to refine the visual appearance by imposing structural consistency on the reconstructions of regular and deformed texts. Experiments on the benchmark TextZoom dataset show that the proposed TATT not only achieves state-of-the-art performance in terms of PSNR/SSIM metrics, but also significantly improves the recognition accuracy in the downstream text recognition task, particularly for text instances with multi-orientation and curved shapes. Code is available at https://github.com/mjq11302010044/TATT.
引用
收藏
页码:5901 / 5910
页数:10
相关论文
共 50 条
  • [1] Scene Text Image Super-Resolution via Parallelly Contextual Attention Network
    Zhao, Cairong
    Feng, Shuyang
    Zhao, Brian Nlong
    Ding, Zhijun
    Wu, Jun
    Shen, Fuming
    Shen, Heng Tao
    [J]. PROCEEDINGS OF THE 29TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2021, 2021, : 2908 - 2917
  • [2] Text Prior Guided Scene Text Image Super-Resolution
    Ma, Jianqi
    Guo, Shi
    Zhang, Lei
    [J]. IEEE TRANSACTIONS ON IMAGE PROCESSING, 2023, 32 : 1341 - 1353
  • [3] Scene Text Telescope: Text-Focused Scene Image Super-Resolution
    Chen, Jingye
    Li, Bin
    Xue, Xiangyang
    [J]. 2021 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR 2021, 2021, : 12021 - 12030
  • [4] Soft-edge-guided significant coordinate attention network for scene text image super-resolution
    Xi, Chenchen
    Zhang, Kaibing
    He, Xin
    Hu, Yanting
    Chen, Jinguang
    [J]. VISUAL COMPUTER, 2024, 40 (08): : 5393 - 5406
  • [5] Text Gestalt: Stroke-Aware Scene Text Image Super-resolution
    Chen, Jingye
    Yu, Haiyang
    Ma, Jianqi
    Li, Bin
    Xue, Xiangyang
    [J]. THIRTY-SIXTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE / THIRTY-FOURTH CONFERENCE ON INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE / THE TWELVETH SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2022, : 285 - 293
  • [6] Towards Robust Scene Text Image Super-resolution via Explicit Location Enhancement
    Guo, Hang
    Dai, Tao
    Meng, Guanghao
    Xia, Shu-Tao
    [J]. PROCEEDINGS OF THE THIRTY-SECOND INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, IJCAI 2023, 2023, : 782 - 790
  • [7] Batch-transformer for scene text image super-resolution
    Sun, Yaqi
    Xie, Xiaolan
    Li, Zhi
    Yang, Kai
    [J]. VISUAL COMPUTER, 2024, 40 (10): : 7399 - 7409
  • [8] Perceiving Multiple Representations for scene text image super-resolution guided by text recognizer
    Shi, Qin
    Zhu, Yu
    Liu, Yatong
    Ye, Jiongyao
    Yang, Dawei
    [J]. ENGINEERING APPLICATIONS OF ARTIFICIAL INTELLIGENCE, 2023, 124
  • [9] Text-Enhanced Scene Image Super-Resolution via Stroke Mask and Orthogonal Attention
    Shu, Rui
    Zhao, Cairong
    Feng, Shuyang
    Zhu, Liang
    Miao, Duoqian
    [J]. IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2023, 33 (11) : 6317 - 6330
  • [10] Real Scene Text Image Super-Resolution Based on Multi-Scale and Attention Fusion
    Lu, Xinhua
    Wei, Haihai
    Ma, Li
    Xue, Qingji
    Fu, Yonghui
    [J]. JOURNAL OF INFORMATION PROCESSING SYSTEMS, 2023, 19 (04): : 427 - 438