Multi-modal degradation feature learning for unified image restoration based on contrastive learning

被引:0
|
作者
Chen, Lei [1 ]
Xiong, Qingbo [1 ]
Zhang, Wei [1 ,2 ]
Liang, Xiaoli [1 ]
Gan, Zhihua [1 ]
Li, Liqiang [3 ]
He, Xin [1 ]
机构
[1] Henan Univ, Sch Software, Jinming Rd, Kaifeng 475004, Peoples R China
[2] China Univ Labor Relat, Sch Appl Technol, Zengguang Rd, Beijing 100048, Peoples R China
[3] Shangqiu Normal Univ, Sch Phys, Shangqiu 476000, Peoples R China
基金
美国国家科学基金会;
关键词
Unified image restoration; Multi-modal features; Contrastive learning; Deep learning;
D O I
10.1016/j.neucom.2024.128955
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In this paper, we address the unified image restoration challenge by reframing it as a contrastive learning- based classification problem. Despite the significant strides made by deep learning methods in enhancing image restoration quality, their limited capacity to generalize across diverse degradation types and intensities necessitates the training of separate models for each specific degradation scenario. We proposes an all- encompassing approach that can restore images from various unknown corruption types and levels. We devise a method that learns representations of the latent sharp image's degradation and accompanying textual features (such as dataset categories and image content descriptions), converting these into prompts which are then embedded within a reconstruction network model to enhance cross-database restoration performance. This culminates in a unified image reconstruction framework. The study involves two stages: In the first stage, we design a MultiContentNet that learns multi-modal features (MMFs) of the latent sharp image. This network encodes the visual degradation expressions and contextual text features into latent variables, thereby exerting a guided classification effect. Specifically, MultiContentNet is trained as an auxiliary controller capable of taking the degraded input image and, through contrastive learning, extracts MMFs of the latent target image. This effectively generates natural classifiers tailored for different degradation types. The second phase integrates the learned MMFs into an image restoration network via cross-attention mechanisms. This guides the restoration model to learn high-fidelity image recovery. Experiments conducted on six blind image restoration tasks demonstrate that the proposed method achieves state-of-the-art performance, highlighting the potential significance of large-scale pretrained vision-language models' MMFs in advancing high-quality unified image reconstruction.
引用
收藏
页数:11
相关论文
共 50 条
  • [31] Multi-modal Graph Contrastive Learning for Micro-video Recommendation
    Yi, Zixuan
    Wang, Xi
    Ounis, Iadh
    Macdonald, Craig
    PROCEEDINGS OF THE 45TH INTERNATIONAL ACM SIGIR CONFERENCE ON RESEARCH AND DEVELOPMENT IN INFORMATION RETRIEVAL (SIGIR '22), 2022, : 1807 - 1811
  • [32] Mutually-Guided Hierarchical Multi-Modal Feature Learning for Referring Image Segmentation
    Li, Jiachen
    Xie, Qing
    Chang, Xiaojun
    Xu, Jinyu
    Liu, Yongjian
    ACM TRANSACTIONS ON MULTIMEDIA COMPUTING COMMUNICATIONS AND APPLICATIONS, 2024, 20 (12)
  • [33] Reliable multi-modal prototypical contrastive learning for difficult airway assessment
    Li, Xiaofan
    Peng, Bo
    Yao, Yuan
    Zhang, Guangchao
    Xie, Zhuyang
    Saleem, Muhammad Usman
    EXPERT SYSTEMS WITH APPLICATIONS, 2025, 273
  • [34] Deep Learning Based Multi-modal Cardiac MR Image Segmentation
    Zheng, Rencheng
    Zhao, Xingzhong
    Zhao, Xingming
    Wang, He
    STATISTICAL ATLASES AND COMPUTATIONAL MODELS OF THE HEART: MULTI-SEQUENCE CMR SEGMENTATION, CRT-EPIGGY AND LV FULL QUANTIFICATION CHALLENGES, 2020, 12009 : 263 - 270
  • [35] Optimized transfer learning based multi-modal medical image retrieval
    Abid, Muhammad Haris
    Ashraf, Rehan
    Mahmood, Toqeer
    Faisal, C. M. Nadeem
    MULTIMEDIA TOOLS AND APPLICATIONS, 2023, 83 (15) : 44069 - 44100
  • [36] Image caption of space science experiment based on multi-modal learning
    Li P.-Z.
    Wan X.
    Li S.-Y.
    Guangxue Jingmi Gongcheng/Optics and Precision Engineering, 2021, 29 (12): : 2944 - 2955
  • [37] IMAGE DESCRIPTION THROUGH FUSION BASED RECURRENT MULTI-MODAL LEARNING
    Oruganti, Ram Manohar
    Sah, Shagan
    Pillai, Suhas
    Ptucha, Raymond
    2016 IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING (ICIP), 2016, : 3613 - 3617
  • [38] A Hard Negatives Mining and Enhancing Method for Multi-Modal Contrastive Learning
    Li, Guangping
    Gao, Yanan
    Huang, Xianhui
    Ling, Bingo Wing-Kuen
    ELECTRONICS, 2025, 14 (04):
  • [39] ConOffense: Multi-modal multitask Contrastive learning for offensive content identification
    Shome, Debaditya
    Kar, T.
    2021 IEEE INTERNATIONAL CONFERENCE ON BIG DATA (BIG DATA), 2021, : 4524 - 4529
  • [40] CLMTR: a generic framework for contrastive multi-modal trajectory representation learning
    Liang, Anqi
    Yao, Bin
    Xie, Jiong
    Zheng, Wenli
    Shen, Yanyan
    Ge, Qiqi
    GEOINFORMATICA, 2024, : 233 - 253