Leveraging vision-language prompts for real-world image restoration and enhancement

被引：0

作者：

机构：

[1] [1,Wei, Yanyan

[2] Zhang, Yilin

[3] Li, Kun

[4] Wang, Fei

[5] Tang, Shengeng

[6] 1,Zhang, Zhao

来源：

Zhang, Zhao (cszzhang@gmail.com) | 2025年 / 250卷

基金：

中国国家自然科学基金;

关键词：

Image denoising - Image enhancement - Image quality - Image reconstruction - Restoration - Weather modification;

D O I：

10.1016/j.cviu.2024.104222

中图分类号：

学科分类号：

摘要：

Significant advancements have been made in image restoration methods aimed at removing adverse weather effects. However, due to natural constraints, it is challenging to collect real-world datasets for adverse weather removal tasks. Consequently, existing methods predominantly rely on synthetic datasets, which struggle to generalize to real-world data, thereby limiting their practical utility. While some real-world adverse weather removal datasets have emerged, their design, which involves capturing ground truths at a different moment, inevitably introduces interfering discrepancies between the degraded images and the ground truths. These discrepancies include variations in brightness, color, contrast, and minor misalignments. Meanwhile, real-world datasets typically involve complex rather than singular degradation types. In many samples, degradation features are not overt, which poses immense challenges to real-world adverse weather removal methodologies. To tackle these issues, we introduce the recently prominent vision-language model, CLIP, to aid in the image restoration process. An expanded and fine-tuned CLIP model acts as an ‘expert’, leveraging the image priors acquired through large-scale pre-training to guide the operation of the image restoration model. Additionally, we generate a set of pseudo-ground-truths on sequences of degraded images to further alleviate the difficulty for the model in fitting the data. To imbue the model with more prior knowledge about degradation characteristics, we also incorporate additional synthetic training data. Lastly, the progressive learning and fine-tuning strategies employed during training enhance the model's final performance, enabling our method to surpass existing approaches in both visual quality and objective image quality assessment metrics. © 2024 Elsevier Inc.

引用

共 50 条

[41] Leveraging Real-World Evidence to Enhance Clinical Trials
Borkar, Durga S.
Parke II, David W.
Lee, Aaron Y.
OPHTHALMOLOGY, 2024, 131 (07) : 756 - 758
[42] A Virtual Restoration Stage for Real-World Objects
Aliaga, Daniel G.
Law, Alvin J.
Yeung, Yu Hong
ACM TRANSACTIONS ON GRAPHICS, 2008, 27 (05):
[43] Real-World Underwater Image Enhancement Based on Attention U-Net
Tang, Pengfei
Li, Liangliang
Xue, Yuan
Lv, Ming
Jia, Zhenhong
Ma, Hongbing
JOURNAL OF MARINE SCIENCE AND ENGINEERING, 2023, 11 (03)
[44] An Advanced Single-Image Visibility Restoration Algorithm for Real-World Hazy Scenes
Huang, Shih-Chia
Ye, Jian-Hui
Chen, Bo-Hao
IEEE TRANSACTIONS ON INDUSTRIAL ELECTRONICS, 2015, 62 (05) : 2962 - 2972
[45] UniDCP: Unifying Multiple Medical Vision-Language Tasks via Dynamic Cross-Modal Learnable Prompts
Zhan, Chenlu
Zhang, Yufei
Lin, Yu
Wang, Gaoang
Wang, Hongwei
IEEE TRANSACTIONS ON MULTIMEDIA, 2024, 26 : 9736 - 9748
[46] IQAGPT: computed tomography image quality assessment with vision-language and ChatGPT models
Chen, Zhihao
Hu, Bin
Niu, Chuang
Chen, Tao
Li, Yuxin
Shan, Hongming
Wang, Ge
VISUAL COMPUTING FOR INDUSTRY BIOMEDICINE AND ART, 2024, 7 (01)
[47] VILA: Learning Image Aesthetics from User Comments with Vision-Language Pretraining
Ke, Junjie
Ye, Keren
Yu, Jiahui
Wu, Yonghui
Milanfar, Peyman
Yang, Feng
2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2023, : 10041 - 10051
[48] Vision-language joint representation learning for sketch less facial image retrieval
Dai, Dawei
Fu, Shiyu
Liu, Yingge
Wang, Guoyin
INFORMATION FUSION, 2024, 112
[49] ProVLA: Compositional Image Search with Progressive Vision-Language Alignment and Multimodal Fusion
Hu, Zhizhang
Zhu, Xinliang
Tran, Son
Vidal, Rene
Dhua, Arnab
2023 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION WORKSHOPS, ICCVW, 2023, : 2764 - 2769
[50] A real-world vision system: Mechanism, control, and vision processing
Dankers, A
Zelinsky, A
COMPUTER VISION SYSTEMS, PROCEEDINGS, 2003, 2626 : 223 - 235

← 1 2 3 4 5 →