Rethinking Super-Resolution as Text-Guided Details Generation

被引:0
|
作者
Ma, Chenxi [1 ]
Yan, Bo [1 ]
Lin, Qing [1 ]
Tan, Weimin [1 ]
Chen, Siming [2 ]
机构
[1] Fudan Univ, Shanghai Collaborat Innovat Ctr Intelligent Visua, Sch Comp Sci, Shanghai Key Lab Intelligent Informat Proc, Shanghai, Peoples R China
[2] Fudan Univ, Sch Data Sci, Shanghai, Peoples R China
关键词
single image super-resolution; text-guided super-resolution; multi-modal fusion learning;
D O I
10.1145/3503161.3547951
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
Deep neural networks have greatly promoted the performance of single image super-resolution (SISR). Conventional methods still resort to restoring the single high-resolution (HR) solution only based on the input of image modality. However, the image-level information is insufficient to predict adequate details and photo-realistic visual quality facing large upscaling factors (x8, x16). In this paper, we propose a new perspective that regards the SISR as a semantic image detail enhancement problem to generate semantically reasonable HR image that are faithful to the ground truth. To enhance the semantic accuracy and the visual quality of the reconstructed image, we explore the multi-modal fusion learning in SISR by proposing a Text-Guided Super-Resolution (TGSR) framework, which can effectively utilize the information from the text and image modalities. Different from existing methods, the proposed TGSR could generate HR image details that match the text descriptions through a coarse-to-fine process. Extensive experiments and ablation studies demonstrate the effect of the TGSR, which exploits the text reference to recover realistic images.
引用
收藏
页码:3461 / 3469
页数:9
相关论文
共 50 条
  • [1] Text Prior Guided Scene Text Image Super-Resolution
    Ma, Jianqi
    Guo, Shi
    Zhang, Lei
    [J]. IEEE TRANSACTIONS ON IMAGE PROCESSING, 2023, 32 : 1341 - 1353
  • [2] Text Image Super-Resolution Guided by Text Structure and Embedding Priors
    Huang, Cong
    Peng, Xiulian
    Liu, Dong
    Lu, Yan
    [J]. ACM TRANSACTIONS ON MULTIMEDIA COMPUTING COMMUNICATIONS AND APPLICATIONS, 2023, 19 (06)
  • [3] Perceiving Multiple Representations for scene text image super-resolution guided by text recognizer
    Shi, Qin
    Zhu, Yu
    Liu, Yatong
    Ye, Jiongyao
    Yang, Dawei
    [J]. ENGINEERING APPLICATIONS OF ARTIFICIAL INTELLIGENCE, 2023, 124
  • [4] A Text-Guided Generation and Refinement Model for Image Captioning
    Wang, Depeng
    Hu, Zhenzhen
    Zhou, Yuanen
    Hong, Richang
    Wang, Meng
    [J]. IEEE TRANSACTIONS ON MULTIMEDIA, 2023, 25 : 2966 - 2977
  • [5] CycleNet: Rethinking Cycle Consistency in Text-Guided Diffusion for Image Manipulation
    Xu, Sihan
    Ma, Ziqiao
    Huang, Yidong
    Lee, Honglak
    Chai, Joyce
    [J]. ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 36 (NEURIPS 2023), 2023,
  • [6] Text-Guided Molecule Generation with Diffusion Language Model
    Gong, Haisong
    Liu, Qiang
    Wu, Shu
    Wang, Liang
    [J]. THIRTY-EIGHTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 38 NO 1, 2024, : 109 - 117
  • [7] Rethinking Alignment in Video Super-Resolution Transformers
    Shi, Shuwei
    Gu, Jinjin
    Xie, Liangbin
    Wang, Xintao
    Yang, Yujiu
    Dong, Chao
    [J]. ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 35 (NEURIPS 2022), 2022,
  • [8] RETHINKING SUPER-RESOLUTION: THE BANDWIDTH SELECTION PROBLEM
    Batenkov, Dmitry
    Bhandari, Ayush
    Blu, Thierry
    [J]. 2019 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2019, : 5087 - 5091
  • [9] Learning Universal Policies via Text-Guided Video Generation
    Du, Yilun
    Yang, Mengjiao
    Dai, Bo
    Dai, Hanjun
    Nachum, Ofir
    Tenenbaum, Joshua B.
    Schuurmans, Dale
    Abbeel, Pieter
    [J]. ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 36 (NEURIPS 2023), 2023,
  • [10] TediGAN: Text-Guided Diverse Face Image Generation and Manipulation
    Xia, Weihao
    Yang, Yujiu
    Xue, Jing-Hao
    Wu, Baoyuan
    [J]. 2021 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR 2021, 2021, : 2256 - 2265