Multi-modal degradation feature learning for unified image restoration based on contrastive learning

被引:0
|
作者
Chen, Lei [1 ]
Xiong, Qingbo [1 ]
Zhang, Wei [1 ,2 ]
Liang, Xiaoli [1 ]
Gan, Zhihua [1 ]
Li, Liqiang [3 ]
He, Xin [1 ]
机构
[1] Henan Univ, Sch Software, Jinming Rd, Kaifeng 475004, Peoples R China
[2] China Univ Labor Relat, Sch Appl Technol, Zengguang Rd, Beijing 100048, Peoples R China
[3] Shangqiu Normal Univ, Sch Phys, Shangqiu 476000, Peoples R China
基金
美国国家科学基金会;
关键词
Unified image restoration; Multi-modal features; Contrastive learning; Deep learning;
D O I
10.1016/j.neucom.2024.128955
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In this paper, we address the unified image restoration challenge by reframing it as a contrastive learning- based classification problem. Despite the significant strides made by deep learning methods in enhancing image restoration quality, their limited capacity to generalize across diverse degradation types and intensities necessitates the training of separate models for each specific degradation scenario. We proposes an all- encompassing approach that can restore images from various unknown corruption types and levels. We devise a method that learns representations of the latent sharp image's degradation and accompanying textual features (such as dataset categories and image content descriptions), converting these into prompts which are then embedded within a reconstruction network model to enhance cross-database restoration performance. This culminates in a unified image reconstruction framework. The study involves two stages: In the first stage, we design a MultiContentNet that learns multi-modal features (MMFs) of the latent sharp image. This network encodes the visual degradation expressions and contextual text features into latent variables, thereby exerting a guided classification effect. Specifically, MultiContentNet is trained as an auxiliary controller capable of taking the degraded input image and, through contrastive learning, extracts MMFs of the latent target image. This effectively generates natural classifiers tailored for different degradation types. The second phase integrates the learned MMFs into an image restoration network via cross-attention mechanisms. This guides the restoration model to learn high-fidelity image recovery. Experiments conducted on six blind image restoration tasks demonstrate that the proposed method achieves state-of-the-art performance, highlighting the potential significance of large-scale pretrained vision-language models' MMFs in advancing high-quality unified image reconstruction.
引用
收藏
页数:11
相关论文
共 50 条
  • [41] Optimized transfer learning based multi-modal medical image retrieval
    Muhammad Haris Abid
    Rehan Ashraf
    Toqeer Mahmood
    C. M. Nadeem Faisal
    Multimedia Tools and Applications, 2024, 83 : 44069 - 44100
  • [42] Split Learning of Multi-Modal Medical Image Classification
    Ghosh, Bishwamittra
    Wang, Yuan
    Fu, Huazhu
    Wei, Qingsong
    Liu, Yong
    Goh, Rick Siow Mong
    2024 IEEE CONFERENCE ON ARTIFICIAL INTELLIGENCE, CAI 2024, 2024, : 1326 - 1331
  • [43] Electromagnetic signal feature fusion and recognition based on multi-modal deep learning
    Hou C.
    Zhang X.
    Chen X.
    International Journal of Performability Engineering, 2020, 16 (06): : 941 - 949
  • [44] Multi-modal active learning with deep reinforcement learning for target feature extraction in multi-media image processing applications
    Gaurav Dhiman
    A. Vignesh Kumar
    R. Nirmalan
    S. Sujitha
    K. Srihari
    N. Yuvaraj
    P. Arulprakash
    R. Arshath Raja
    Multimedia Tools and Applications, 2023, 82 : 5343 - 5367
  • [45] Learning discriminative motion feature for enhancing multi-modal action
    Yang, Jianyu
    Huang, Yao
    Shao, Zhanpeng
    Liu, Chunping
    JOURNAL OF VISUAL COMMUNICATION AND IMAGE REPRESENTATION, 2021, 79
  • [46] Multi-modal active learning with deep reinforcement learning for target feature extraction in multi-media image processing applications
    Dhiman, Gaurav
    Kumar, A. Vignesh
    Nirmalan, R.
    Sujitha, S.
    Srihari, K.
    Yuvaraj, N.
    Arulprakash, P.
    Raja, R. Arshath
    MULTIMEDIA TOOLS AND APPLICATIONS, 2023, 82 (04) : 5343 - 5367
  • [47] Learning Common and Transferable Feature Representations for Multi-Modal Data
    Nitsch, Julia
    Nieto, Juan
    Siegwart, Roland
    Schmidt, Max
    Cadena, Cesar
    2020 IEEE INTELLIGENT VEHICLES SYMPOSIUM (IV), 2020, : 1595 - 1601
  • [48] Non-deep CNN for Multi-Modal Image Classification and Feature Learning: An Azure-based Model
    Roychowdhury, Sohini
    Ren, Johnny
    2016 IEEE INTERNATIONAL CONFERENCE ON BIG DATA (BIG DATA), 2016, : 2803 - 2812
  • [49] Unified Multi-modal Unsupervised Representation Learning for Skeleton-based Action Understanding
    Sun, Shengkai
    Liu, Daizong
    Dong, Jianfeng
    Qu, Xiaoye
    Gao, Junyu
    Yang, Xun
    Wang, Xun
    Wang, Meng
    PROCEEDINGS OF THE 31ST ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2023, 2023, : 2973 - 2984
  • [50] A Unified Deep Learning Framework for Multi-Modal Multi-Dimensional Data
    Xi, Pengcheng
    Goubran, Rafik
    Shu, Chang
    2019 IEEE INTERNATIONAL SYMPOSIUM ON MEDICAL MEASUREMENTS AND APPLICATIONS (MEMEA), 2019,