Leveraging vision-language prompts for real-world image restoration and enhancement

被引:0
|
作者
机构
[1] [1,Wei, Yanyan
[2] Zhang, Yilin
[3] Li, Kun
[4] Wang, Fei
[5] Tang, Shengeng
[6] 1,Zhang, Zhao
基金
中国国家自然科学基金;
关键词
Image denoising - Image enhancement - Image quality - Image reconstruction - Restoration - Weather modification;
D O I
10.1016/j.cviu.2024.104222
中图分类号
学科分类号
摘要
Significant advancements have been made in image restoration methods aimed at removing adverse weather effects. However, due to natural constraints, it is challenging to collect real-world datasets for adverse weather removal tasks. Consequently, existing methods predominantly rely on synthetic datasets, which struggle to generalize to real-world data, thereby limiting their practical utility. While some real-world adverse weather removal datasets have emerged, their design, which involves capturing ground truths at a different moment, inevitably introduces interfering discrepancies between the degraded images and the ground truths. These discrepancies include variations in brightness, color, contrast, and minor misalignments. Meanwhile, real-world datasets typically involve complex rather than singular degradation types. In many samples, degradation features are not overt, which poses immense challenges to real-world adverse weather removal methodologies. To tackle these issues, we introduce the recently prominent vision-language model, CLIP, to aid in the image restoration process. An expanded and fine-tuned CLIP model acts as an ‘expert’, leveraging the image priors acquired through large-scale pre-training to guide the operation of the image restoration model. Additionally, we generate a set of pseudo-ground-truths on sequences of degraded images to further alleviate the difficulty for the model in fitting the data. To imbue the model with more prior knowledge about degradation characteristics, we also incorporate additional synthetic training data. Lastly, the progressive learning and fine-tuning strategies employed during training enhance the model's final performance, enabling our method to surpass existing approaches in both visual quality and objective image quality assessment metrics. © 2024 Elsevier Inc.
引用
收藏
相关论文
共 50 条
  • [41] Leveraging Real-World Evidence to Enhance Clinical Trials
    Borkar, Durga S.
    Parke II, David W.
    Lee, Aaron Y.
    OPHTHALMOLOGY, 2024, 131 (07) : 756 - 758
  • [42] A Virtual Restoration Stage for Real-World Objects
    Aliaga, Daniel G.
    Law, Alvin J.
    Yeung, Yu Hong
    ACM TRANSACTIONS ON GRAPHICS, 2008, 27 (05):
  • [43] Real-World Underwater Image Enhancement Based on Attention U-Net
    Tang, Pengfei
    Li, Liangliang
    Xue, Yuan
    Lv, Ming
    Jia, Zhenhong
    Ma, Hongbing
    JOURNAL OF MARINE SCIENCE AND ENGINEERING, 2023, 11 (03)
  • [44] An Advanced Single-Image Visibility Restoration Algorithm for Real-World Hazy Scenes
    Huang, Shih-Chia
    Ye, Jian-Hui
    Chen, Bo-Hao
    IEEE TRANSACTIONS ON INDUSTRIAL ELECTRONICS, 2015, 62 (05) : 2962 - 2972
  • [45] UniDCP: Unifying Multiple Medical Vision-Language Tasks via Dynamic Cross-Modal Learnable Prompts
    Zhan, Chenlu
    Zhang, Yufei
    Lin, Yu
    Wang, Gaoang
    Wang, Hongwei
    IEEE TRANSACTIONS ON MULTIMEDIA, 2024, 26 : 9736 - 9748
  • [46] IQAGPT: computed tomography image quality assessment with vision-language and ChatGPT models
    Chen, Zhihao
    Hu, Bin
    Niu, Chuang
    Chen, Tao
    Li, Yuxin
    Shan, Hongming
    Wang, Ge
    VISUAL COMPUTING FOR INDUSTRY BIOMEDICINE AND ART, 2024, 7 (01)
  • [47] VILA: Learning Image Aesthetics from User Comments with Vision-Language Pretraining
    Ke, Junjie
    Ye, Keren
    Yu, Jiahui
    Wu, Yonghui
    Milanfar, Peyman
    Yang, Feng
    2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2023, : 10041 - 10051
  • [48] Vision-language joint representation learning for sketch less facial image retrieval
    Dai, Dawei
    Fu, Shiyu
    Liu, Yingge
    Wang, Guoyin
    INFORMATION FUSION, 2024, 112
  • [49] ProVLA: Compositional Image Search with Progressive Vision-Language Alignment and Multimodal Fusion
    Hu, Zhizhang
    Zhu, Xinliang
    Tran, Son
    Vidal, Rene
    Dhua, Arnab
    2023 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION WORKSHOPS, ICCVW, 2023, : 2764 - 2769
  • [50] A real-world vision system: Mechanism, control, and vision processing
    Dankers, A
    Zelinsky, A
    COMPUTER VISION SYSTEMS, PROCEEDINGS, 2003, 2626 : 223 - 235