Leveraging vision-language prompts for real-world image restoration and enhancement

被引:0
|
作者
机构
[1] [1,Wei, Yanyan
[2] Zhang, Yilin
[3] Li, Kun
[4] Wang, Fei
[5] Tang, Shengeng
[6] 1,Zhang, Zhao
基金
中国国家自然科学基金;
关键词
Image denoising - Image enhancement - Image quality - Image reconstruction - Restoration - Weather modification;
D O I
10.1016/j.cviu.2024.104222
中图分类号
学科分类号
摘要
Significant advancements have been made in image restoration methods aimed at removing adverse weather effects. However, due to natural constraints, it is challenging to collect real-world datasets for adverse weather removal tasks. Consequently, existing methods predominantly rely on synthetic datasets, which struggle to generalize to real-world data, thereby limiting their practical utility. While some real-world adverse weather removal datasets have emerged, their design, which involves capturing ground truths at a different moment, inevitably introduces interfering discrepancies between the degraded images and the ground truths. These discrepancies include variations in brightness, color, contrast, and minor misalignments. Meanwhile, real-world datasets typically involve complex rather than singular degradation types. In many samples, degradation features are not overt, which poses immense challenges to real-world adverse weather removal methodologies. To tackle these issues, we introduce the recently prominent vision-language model, CLIP, to aid in the image restoration process. An expanded and fine-tuned CLIP model acts as an ‘expert’, leveraging the image priors acquired through large-scale pre-training to guide the operation of the image restoration model. Additionally, we generate a set of pseudo-ground-truths on sequences of degraded images to further alleviate the difficulty for the model in fitting the data. To imbue the model with more prior knowledge about degradation characteristics, we also incorporate additional synthetic training data. Lastly, the progressive learning and fine-tuning strategies employed during training enhance the model's final performance, enabling our method to surpass existing approaches in both visual quality and objective image quality assessment metrics. © 2024 Elsevier Inc.
引用
收藏
相关论文
共 50 条
  • [1] Beyond Generic: Enhancing Image Captioning with Real-World Knowledge using Vision-Language Pre-Training Model
    Cheng, Kanzhi
    Song, Wenpo
    Ma, Zheng
    Zhu, Wenhao
    Zhu, Zixuan
    Zhang, Jianbing
    [J]. PROCEEDINGS OF THE 31ST ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2023, 2023, : 5038 - 5047
  • [2] ViLLA: Fine-Grained Vision-Language Representation Learning from Real-World Data
    Varma, Maya
    Delbrouck, Jean-Benoit
    Hooper, Sarah
    Chaudhari, Akshay
    Langlotz, Curtis
    [J]. 2023 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2023), 2023, : 22168 - 22178
  • [3] VisIT-Bench: A Benchmark for Vision-Language Instruction Following Inspired by Real-World Use
    Bitton, Yonatan
    Bansal, Hritik
    Hessel, Jack
    Shao, Rulin
    Zhu, Wanrong
    Awadalla, Anas
    Gardner, Josh
    Taori, Rohan
    Schimdt, Ludwig
    [J]. ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 36 (NEURIPS 2023), 2023,
  • [4] Leveraging per Image-Token Consistency for Vision-Language Pre-training
    Gou, Yunhao
    Ko, Tom
    Yang, Hansi
    Kwok, James
    Zhang, Yu
    Wang, Mingxuan
    [J]. 2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2023, : 19155 - 19164
  • [5] ADAPT: Vision-Language Navigation with Modality-Aligned Action Prompts
    Lin, Bingqian
    Zhu, Yi
    Chen, Zicong
    Liang, Xiwen
    Liu, Jianzhuang
    Liang, Xiaodan
    [J]. 2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2022), 2022, : 15375 - 15385
  • [6] Image as a Foreign Language: BEIT Pretraining for Vision and Vision-Language Tasks
    Wang, Wenhui
    Bao, Hangbo
    Dong, Li
    Bjorck, Johan
    Peng, Zhiliang
    Liu, Qiang
    Aggarwal, Kriti
    Mohammed, Owais Khan
    Singhal, Saksham
    Som, Subhojit
    Wei, Furu
    [J]. 2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2023, : 19175 - 19186
  • [7] Toward Real-world Panoramic Image Enhancement
    Zhang, Yupeng
    Zhang, Hengzhi
    Li, Daojing
    Liu, Liyan
    Yi, Hong
    Wang, Wei
    Suitoh, Hiroshi
    Odamaki, Makoto
    [J]. 2020 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION WORKSHOPS (CVPRW 2020), 2020, : 2675 - 2684
  • [8] Mapping Language to Vision in a Real-World Robotic Scenario
    Stepanova, Karla
    Klein, Frederico Belmonte
    Cangelosi, Angelo
    Vavrecka, Michal
    [J]. IEEE TRANSACTIONS ON COGNITIVE AND DEVELOPMENTAL SYSTEMS, 2018, 10 (03) : 784 - 794
  • [9] Debiased Subjective Assessment of Real-World Image Enhancement
    Cao, Peibei
    Wang, Zhangyang
    Ma, Kede
    [J]. 2021 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR 2021, 2021, : 711 - 721
  • [10] Image restoration for real-world under-display imaging
    Gao, KeMing
    Chang, Meng
    Jiang, Kunjun
    Wang, Yaxu
    Xu, Zhihai
    Feng, Huajun
    Li, Qi
    Hu, Zengxin
    Chen, YueTing
    [J]. OPTICS EXPRESS, 2021, 29 (23) : 37820 - 37834