Multi-Modal Prior-Guided Diffusion Model for Blind Image Super-Resolution

被引:0
|
作者
Huang, Detian [1 ]
Song, Jiaxun [1 ]
Huang, Xiaoqian [2 ]
Hu, Zhenzhen [3 ]
Zeng, Huanqiang [1 ]
机构
[1] Huaqiao Univ, Coll Engn, Quanzhou 362021, Peoples R China
[2] Huaqiao Univ, Coll Informat Sci & Engn, Xiamen 361021, Peoples R China
[3] Hefei Univ Technol, Coll Comp Sci & Informat Engn, Hefei 230009, Peoples R China
基金
中国国家自然科学基金; 国家重点研发计划;
关键词
Image restoration; Feature extraction; Degradation; Transformers; Diffusion models; Visualization; Superresolution; Navigation; Image reconstruction; Adaptive systems; Blind image super-resolution; diffusion model; multi-modal guidance; transformer model;
D O I
10.1109/LSP.2024.3516699
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
Recently, diffusion models have achieved remarkable success in blind image super-resolution. However, most existing methods rely solely on uni-modal degraded low-resolution images to guide diffusion models for restoring high-fidelity images, resulting in inferior realism. In this letter, we propose a Multi-modal Prior-Guided diffusion model for blind image Super-Resolution (MPGSR), which fine-tunes Stable Diffusion (SD) by utilizing the superior visual-and-textual guidance for restoring realistic high-resolution images. Specifically, our MPGSR involves two stages, i.e., multi-modal guidance extraction and adaptive guidance injection. For the former, we propose a composited transformer and further incorporate it with GPT-CLIP to extract the representative visual-and-textual guidance. For the latter, we design a feature calibration ControlNet to inject the visual guidance and employ the cross-attention layer provided by the frozen SD to inject the textual guidance, thus effectively activating the powerful text-to-image generation potential. Extensive experiments show that our MPGSR outperforms state-of-the-art methods in restoration quality and convergence time.
引用
收藏
页码:316 / 320
页数:5
相关论文
共 50 条
  • [1] Adaptive Multi-modal Fusion of Spatially Variant Kernel Refinement with Diffusion Model for Blind Image Super-Resolution
    Lin, Junxiong
    Wang, Yan
    Tao, Zeng
    Wang, Boyang
    Zhao, Qing
    Wang, Haorang
    Tong, Xuan
    Mai, Xinji
    Lin, Yuxuan
    Song, Wei
    Yu, Jiawen
    Yan, Shaoqi
    Zhang, Wenqiang
    COMPUTER VISION - ECCV 2024, PT LII, 2025, 15110 : 363 - 380
  • [2] Multi-modal Spectral Image Super-Resolution
    Lahoud, Fayez
    Zhou, Ruofan
    Susstrunk, Sabine
    COMPUTER VISION - ECCV 2018 WORKSHOPS, PT V, 2019, 11133 : 35 - 50
  • [3] Information sparsity guided transformer for multi-modal medical image super-resolution
    Lu, Haotian
    Mei, Jie
    Qiu, Yu
    Li, Yumeng
    Hao, Fangwei
    Xu, Jing
    Tang, Lin
    EXPERT SYSTEMS WITH APPLICATIONS, 2024, 261
  • [4] COUPLED ISTA NETWORK FOR MULTI-MODAL IMAGE SUPER-RESOLUTION
    Deng, Xin
    Dragotti, Pier Luigi
    2019 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2019, : 1862 - 1866
  • [5] Multi-modal Image Fusion for Multispectral Super-resolution in Microscopy
    Dey, Neel
    Li, Shijie
    Bermond, Katharina
    Heintzmann, Rainer
    Curcio, Christine A.
    Ach, Thomas
    Gerig, Guido
    MEDICAL IMAGING 2019: IMAGE PROCESSING, 2019, 10949
  • [6] Deep Coupled ISTA Network for Multi-Modal Image Super-Resolution
    Deng, Xin
    Dragotti, Pier Luigi
    IEEE TRANSACTIONS ON IMAGE PROCESSING, 2020, 29 (29) : 1683 - 1698
  • [7] Rethinking Prior-Guided Face Super-Resolution: A New Paradigm With Facial Component Prior
    Lu, Tao
    Wang, Yuanzhi
    Zhang, Yanduo
    Jiang, Junjun
    Wang, Zhongyuan
    Xiong, Zixiang
    IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2024, 35 (03) : 3938 - 3952
  • [8] Multi-modal Image Super-resolution with Joint Coupled Deep Transform Learning
    Kanth, R. Krishna
    Gigie, Andrew
    Kumar, Kriti
    Kumar, A. Anil
    Majumdar, Angshul
    Balamuralidhar, P.
    2022 30TH EUROPEAN SIGNAL PROCESSING CONFERENCE (EUSIPCO 2022), 2022, : 474 - 478
  • [9] Multi-modal Image Super-Resolution via Deep Convolutional Transform Learning
    Kumar, Kriti
    Majumdar, Angshul
    Kumar, A. Anil
    Chandra, M. Girish
    32ND EUROPEAN SIGNAL PROCESSING CONFERENCE, EUSIPCO 2024, 2024, : 671 - 675
  • [10] Multi-modal Super-Resolution Microscopy through Super-Resolution Radial Fluctuations (SRRF).
    Cooper, J. T.
    Oleske, J. B.
    MOLECULAR BIOLOGY OF THE CELL, 2018, 29 (26)