Leveraging vision-language prompts for real-world image restoration and enhancement

被引：0

作者：

机构：

[1] [1,Wei, Yanyan

[2] Zhang, Yilin

[3] Li, Kun

[4] Wang, Fei

[5] Tang, Shengeng

[6] 1,Zhang, Zhao

来源：

Zhang, Zhao (cszzhang@gmail.com) | 2025年 / 250卷

基金：

中国国家自然科学基金;

关键词：

Image denoising - Image enhancement - Image quality - Image reconstruction - Restoration - Weather modification;

D O I：

10.1016/j.cviu.2024.104222

中图分类号：

学科分类号：

摘要：

Significant advancements have been made in image restoration methods aimed at removing adverse weather effects. However, due to natural constraints, it is challenging to collect real-world datasets for adverse weather removal tasks. Consequently, existing methods predominantly rely on synthetic datasets, which struggle to generalize to real-world data, thereby limiting their practical utility. While some real-world adverse weather removal datasets have emerged, their design, which involves capturing ground truths at a different moment, inevitably introduces interfering discrepancies between the degraded images and the ground truths. These discrepancies include variations in brightness, color, contrast, and minor misalignments. Meanwhile, real-world datasets typically involve complex rather than singular degradation types. In many samples, degradation features are not overt, which poses immense challenges to real-world adverse weather removal methodologies. To tackle these issues, we introduce the recently prominent vision-language model, CLIP, to aid in the image restoration process. An expanded and fine-tuned CLIP model acts as an ‘expert’, leveraging the image priors acquired through large-scale pre-training to guide the operation of the image restoration model. Additionally, we generate a set of pseudo-ground-truths on sequences of degraded images to further alleviate the difficulty for the model in fitting the data. To imbue the model with more prior knowledge about degradation characteristics, we also incorporate additional synthetic training data. Lastly, the progressive learning and fine-tuning strategies employed during training enhance the model's final performance, enabling our method to surpass existing approaches in both visual quality and objective image quality assessment metrics. © 2024 Elsevier Inc.

引用

共 50 条

[1] Beyond Generic: Enhancing Image Captioning with Real-World Knowledge using Vision-Language Pre-Training Model
Cheng, Kanzhi
Song, Wenpo
Ma, Zheng
Zhu, Wenhao
Zhu, Zixuan
Zhang, Jianbing
[J]. PROCEEDINGS OF THE 31ST ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2023, 2023, : 5038 - 5047
[2] ViLLA: Fine-Grained Vision-Language Representation Learning from Real-World Data
Varma, Maya
Delbrouck, Jean-Benoit
Hooper, Sarah
Chaudhari, Akshay
Langlotz, Curtis
[J]. 2023 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2023), 2023, : 22168 - 22178
[3] VisIT-Bench: A Benchmark for Vision-Language Instruction Following Inspired by Real-World Use
Bitton, Yonatan
Bansal, Hritik
Hessel, Jack
Shao, Rulin
Zhu, Wanrong
Awadalla, Anas
Gardner, Josh
Taori, Rohan
Schimdt, Ludwig
[J]. ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 36 (NEURIPS 2023), 2023,
[4] Leveraging per Image-Token Consistency for Vision-Language Pre-training
Gou, Yunhao
Ko, Tom
Yang, Hansi
Kwok, James
Zhang, Yu
Wang, Mingxuan
[J]. 2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2023, : 19155 - 19164
[5] ADAPT: Vision-Language Navigation with Modality-Aligned Action Prompts
Lin, Bingqian
Zhu, Yi
Chen, Zicong
Liang, Xiwen
Liu, Jianzhuang
Liang, Xiaodan
[J]. 2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2022), 2022, : 15375 - 15385
[6] Image as a Foreign Language: BEIT Pretraining for Vision and Vision-Language Tasks
Wang, Wenhui
Bao, Hangbo
Dong, Li
Bjorck, Johan
Peng, Zhiliang
Liu, Qiang
Aggarwal, Kriti
Mohammed, Owais Khan
Singhal, Saksham
Som, Subhojit
Wei, Furu
[J]. 2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2023, : 19175 - 19186
[7] Toward Real-world Panoramic Image Enhancement
Zhang, Yupeng
Zhang, Hengzhi
Li, Daojing
Liu, Liyan
Yi, Hong
Wang, Wei
Suitoh, Hiroshi
Odamaki, Makoto
[J]. 2020 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION WORKSHOPS (CVPRW 2020), 2020, : 2675 - 2684
[8] Mapping Language to Vision in a Real-World Robotic Scenario
Stepanova, Karla
Klein, Frederico Belmonte
Cangelosi, Angelo
Vavrecka, Michal
[J]. IEEE TRANSACTIONS ON COGNITIVE AND DEVELOPMENTAL SYSTEMS, 2018, 10 (03) : 784 - 794
[9] Debiased Subjective Assessment of Real-World Image Enhancement
Cao, Peibei
Wang, Zhangyang
Ma, Kede
[J]. 2021 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR 2021, 2021, : 711 - 721
[10] Image restoration for real-world under-display imaging
Gao, KeMing
Chang, Meng
Jiang, Kunjun
Wang, Yaxu
Xu, Zhihai
Feng, Huajun
Li, Qi
Hu, Zengxin
Chen, YueTing
[J]. OPTICS EXPRESS, 2021, 29 (23) : 37820 - 37834

← 1 2 3 4 5 →