Multi-Modal Prior-Guided Diffusion Model for Blind Image Super-Resolution

被引：0

作者：

Huang, Detian ^{[1
]}

Song, Jiaxun ^{[1
]}

Huang, Xiaoqian ^{[2
]}

Hu, Zhenzhen ^{[3
]}

Zeng, Huanqiang ^{[1
]}

机构：

[1] Huaqiao Univ, Coll Engn, Quanzhou 362021, Peoples R China

[2] Huaqiao Univ, Coll Informat Sci & Engn, Xiamen 361021, Peoples R China

[3] Hefei Univ Technol, Coll Comp Sci & Informat Engn, Hefei 230009, Peoples R China

来源：

IEEE SIGNAL PROCESSING LETTERS | 2025年 / 32卷

基金：

中国国家自然科学基金; 国家重点研发计划;

关键词：

Image restoration; Feature extraction; Degradation; Transformers; Diffusion models; Visualization; Superresolution; Navigation; Image reconstruction; Adaptive systems; Blind image super-resolution; diffusion model; multi-modal guidance; transformer model;

D O I：

10.1109/LSP.2024.3516699

中图分类号：

TM [电工技术]; TN [电子技术、通信技术];

学科分类号：

0808 ; 0809 ;

摘要：

Recently, diffusion models have achieved remarkable success in blind image super-resolution. However, most existing methods rely solely on uni-modal degraded low-resolution images to guide diffusion models for restoring high-fidelity images, resulting in inferior realism. In this letter, we propose a Multi-modal Prior-Guided diffusion model for blind image Super-Resolution (MPGSR), which fine-tunes Stable Diffusion (SD) by utilizing the superior visual-and-textual guidance for restoring realistic high-resolution images. Specifically, our MPGSR involves two stages, i.e., multi-modal guidance extraction and adaptive guidance injection. For the former, we propose a composited transformer and further incorporate it with GPT-CLIP to extract the representative visual-and-textual guidance. For the latter, we design a feature calibration ControlNet to inject the visual guidance and employ the cross-attention layer provided by the frozen SD to inject the textual guidance, thus effectively activating the powerful text-to-image generation potential. Extensive experiments show that our MPGSR outperforms state-of-the-art methods in restoration quality and convergence time.

引用

页码：316 / 320

页数：5

共 50 条

[1] Adaptive Multi-modal Fusion of Spatially Variant Kernel Refinement with Diffusion Model for Blind Image Super-Resolution
Lin, Junxiong
Wang, Yan
Tao, Zeng
Wang, Boyang
Zhao, Qing
Wang, Haorang
Tong, Xuan
Mai, Xinji
Lin, Yuxuan
Song, Wei
Yu, Jiawen
Yan, Shaoqi
Zhang, Wenqiang
COMPUTER VISION - ECCV 2024, PT LII, 2025, 15110 : 363 - 380
[2] Multi-modal Spectral Image Super-Resolution
Lahoud, Fayez
Zhou, Ruofan
Susstrunk, Sabine
COMPUTER VISION - ECCV 2018 WORKSHOPS, PT V, 2019, 11133 : 35 - 50
[3] Information sparsity guided transformer for multi-modal medical image super-resolution
Lu, Haotian
Mei, Jie
Qiu, Yu
Li, Yumeng
Hao, Fangwei
Xu, Jing
Tang, Lin
EXPERT SYSTEMS WITH APPLICATIONS, 2024, 261
[4] COUPLED ISTA NETWORK FOR MULTI-MODAL IMAGE SUPER-RESOLUTION
Deng, Xin
Dragotti, Pier Luigi
2019 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2019, : 1862 - 1866
[5] Multi-modal Image Fusion for Multispectral Super-resolution in Microscopy
Dey, Neel
Li, Shijie
Bermond, Katharina
Heintzmann, Rainer
Curcio, Christine A.
Ach, Thomas
Gerig, Guido
MEDICAL IMAGING 2019: IMAGE PROCESSING, 2019, 10949
[6] Deep Coupled ISTA Network for Multi-Modal Image Super-Resolution
Deng, Xin
Dragotti, Pier Luigi
IEEE TRANSACTIONS ON IMAGE PROCESSING, 2020, 29 (29) : 1683 - 1698
[7] Rethinking Prior-Guided Face Super-Resolution: A New Paradigm With Facial Component Prior
Lu, Tao
Wang, Yuanzhi
Zhang, Yanduo
Jiang, Junjun
Wang, Zhongyuan
Xiong, Zixiang
IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2024, 35 (03) : 3938 - 3952
[8] Multi-modal Image Super-resolution with Joint Coupled Deep Transform Learning
Kanth, R. Krishna
Gigie, Andrew
Kumar, Kriti
Kumar, A. Anil
Majumdar, Angshul
Balamuralidhar, P.
2022 30TH EUROPEAN SIGNAL PROCESSING CONFERENCE (EUSIPCO 2022), 2022, : 474 - 478
[9] Multi-modal Image Super-Resolution via Deep Convolutional Transform Learning
Kumar, Kriti
Majumdar, Angshul
Kumar, A. Anil
Chandra, M. Girish
32ND EUROPEAN SIGNAL PROCESSING CONFERENCE, EUSIPCO 2024, 2024, : 671 - 675
[10] Multi-modal Super-Resolution Microscopy through Super-Resolution Radial Fluctuations (SRRF).
Cooper, J. T.
Oleske, J. B.
MOLECULAR BIOLOGY OF THE CELL, 2018, 29 (26)

← 1 2 3 4 5 →