How Does Fine-Tuning Impact Out-of-Distribution Detection for Vision-Language Models?

被引：4

作者：

Ming, Yifei ^{[1
]}

Li, Yixuan ^{[1
]}

机构：

[1] Univ Wisconsin Madison, Dept Comp Sci, Madison, WI 53715 USA

来源：

INTERNATIONAL JOURNAL OF COMPUTER VISION | 2024年 / 132卷 / 02期

基金：

美国国家科学基金会;

关键词：

CLIP; OOD detection; Fine-tuning; Multi-modality; Vision-language models; Prompt learning; Few-shot learning; Adaptor; BLIND DECONVOLUTION; IDENTIFIABILITY; KERNEL; NOISE;

D O I：

10.1007/s11263-023-01895-7

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Recent large vision-language models such as CLIP have shown remarkable out-of-distribution (OOD) detection and generalization performance. However, their zero-shot in-distribution (ID) accuracy is often limited for downstream datasets. Recent CLIP-based fine-tuning methods such as prompt learning have demonstrated significant improvements in ID classification and OOD generalization where OOD labels are available. Nonetheless, it remains unclear whether the model is reliable to semantic shifts without OOD labels. In this paper, we aim to bridge the gap and present a comprehensive study to understand how fine-tuning impact OOD detection for few-shot downstream tasks. By framing OOD detection as multi-modal concept matching, we establish a connection between fine-tuning methods and various OOD scores. Our results suggest that a proper choice of OOD scores is essential for CLIP-based fine-tuning. In particular, the maximum concept matching (MCM) score provides a promising solution consistently. We also show that prompt learning demonstrates the state-of-the-art OOD detection performance over the zero-shot counterpart.

引用

页码：596 / 609

页数：14

共 50 条

[1] How Does Fine-Tuning Impact Out-of-Distribution Detection for Vision-Language Models?
Yifei Ming
Yixuan Li
[J]. International Journal of Computer Vision, 2024, 132 : 596 - 609
[2] Robust Fine-Tuning of Vision-Language Models for Domain Generalization
Vogt-Lowell, Kevin
Lee, Noah
Tsiligkaridis, Theodoros
Vaillant, Marc
[J]. 2023 IEEE HIGH PERFORMANCE EXTREME COMPUTING CONFERENCE, HPEC, 2023,
[3] Distilling Out-of-Distribution Robustness from Vision-Language Foundation Models
Zhou, Andy
Wang, Jindong
Wang, Yu-Xiong
Wang, Haohan
[J]. ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 36 (NEURIPS 2023), 2023,
[4] Calibrated Language Model Fine-Tuning for In- and Out-of-Distribution Data
Kong, Lingkai
Jiang, Haoming
Zhuang, Yuchen
Lyu, Jie
Zhao, Tuo
Zhang, Chao
[J]. PROCEEDINGS OF THE 2020 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING (EMNLP), 2020, : 1326 - 1340
[5] A survey of efficient fine-tuning methods for Vision-Language Models - Prompt and Adapter
Xing, Jialu
Liu, Jianping
Wang, Jian
Sun, Lulu
Chen, Xi
Gu, Xunxun
Wang, Yingfei
[J]. COMPUTERS & GRAPHICS-UK, 2024, 119
[6] Fine-tuning vs From Scratch: Do Vision & Language Models Have Similar Capabilities on Out-of-Distribution Visual Question Answering?
Jensen, Kristian Norgaard
Plank, Barbara
[J]. LREC 2022: THIRTEEN INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION, 2022, : 1496 - 1508
[7] Distilling Large Vision-Language Model with Out-of-Distribution Generalizability
Li, Xuanlin
Fang, Yunhao
Liu, Minghua
Ling, Zhan
Tu, Zhuowen
Su, Hao
[J]. 2023 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION, ICCV, 2023, : 2492 - 2503
[8] Foundation Models and Fine-Tuning: A Benchmark for Out of Distribution Detection
Cappio Borlino, Francesco
Lu, Lorenzo
Tommasi, Tatiana
[J]. IEEE ACCESS, 2024, 12 : 79401 - 79414
[9] Distribution-Aware Prompt Tuning for Vision-Language Models
Cho, Eulrang
Kim, Jooyeon
Kim, Hyunwoo J.
[J]. 2023 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2023), 2023, : 21947 - 21956
[10] Fine-Tuning Deteriorates General Textual Out-of-Distribution Detection by Distorting Task-Agnostic Features
Chen, Sishuo
Yang, Wenkai
Bi, Xiaohan
Sun, Xu
[J]. 17TH CONFERENCE OF THE EUROPEAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, EACL 2023, 2023, : 564 - 579

← 1 2 3 4 5 →