Robust Fine-Tuning of Vision-Language Models for Domain Generalization

被引:0
|
作者
Vogt-Lowell, Kevin [1 ]
Lee, Noah [2 ]
Tsiligkaridis, Theodoros [1 ]
Vaillant, Marc [2 ]
机构
[1] MIT Lincoln Lab, Artificial Intelligence Technol, Lexington, MA 02421 USA
[2] MIT Lincoln Lab, Homeland Sensors & Analyt, Lexington, MA USA
关键词
foundation model; vision-language model; CLIP; fine-tuning; distribution shift; out-of-distribution robustness;
D O I
10.1109/HPEC58863.2023.10363450
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
Transfer learning enables the sharing of common knowledge among models for a variety of downstream tasks, but traditional methods suffer in limited training data settings and produce narrow models incapable of effectively generalizing under distribution shifts. Foundation models have recently demonstrated impressive zero-shot inference capabilities and robustness under distribution shifts. However, zero-shot evaluation for these models has been predominantly confined to benchmarks with simple distribution shifts, limiting our understanding of their effectiveness under the more realistic shifts found in practice. Moreover, common fine-tuning methods for these models have yet to be evaluated against vision models in few-shot scenarios where training data is limited. To address these gaps, we present a new recipe for few-shot fine-tuning of the popular vision-language foundation model CLIP and evaluate its performance on challenging benchmark datasets with realistic distribution shifts from the WILDS collection. Our experimentation demonstrates that, while zero-shot CLIP fails to match performance of trained vision models on more complex benchmarks, few-shot CLIP fine-tuning outperforms its vision-only counterparts in terms of in-distribution and out-of-distribution accuracy at all levels of training data availability. This provides a strong incentive for adoption of foundation models within few-shot learning applications operating with real-world data.
引用
收藏
页数:7
相关论文
共 50 条
  • [31] Span Fine-tuning for Pre-trained Language Models
    Bao, Rongzhou
    Zhang, Zhuosheng
    Zhao, Hai
    [J]. FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, EMNLP 2021, 2021, : 1970 - 1979
  • [32] Scaling Federated Learning for Fine-Tuning of Large Language Models
    Hilmkil, Agrin
    Callh, Sebastian
    Barbieri, Matteo
    Sutfeld, Leon Rene
    Zec, Edvin Listo
    Mogren, Olof
    [J]. NATURAL LANGUAGE PROCESSING AND INFORMATION SYSTEMS (NLDB 2021), 2021, 12801 : 15 - 23
  • [33] Fine-tuning large language models for chemical text mining
    Zhang, Wei
    Wang, Qinggong
    Kong, Xiangtai
    Xiong, Jiacheng
    Ni, Shengkun
    Cao, Duanhua
    Niu, Buying
    Chen, Mingan
    Li, Yameng
    Zhang, Runze
    Wang, Yitian
    Zhang, Lehan
    Li, Xutong
    Xiong, Zhaoping
    Shi, Qian
    Huang, Ziming
    Fu, Zunyun
    Zheng, Mingyue
    [J]. CHEMICAL SCIENCE, 2024, 15 (27) : 10600 - 10611
  • [34] CTPT: Continual Test-time Prompt Tuning for vision-language models
    Wang, Fan
    Han, Zhongyi
    Liu, Xingbo
    Yin, Yilong
    Gao, Xin
    [J]. Pattern Recognition, 2025, 161
  • [35] CPT: Colorful Prompt Tuning for pre-trained vision-language models
    Yao, Yuan
    Zhang, Ao
    Zhang, Zhengyan
    Liu, Zhiyuan
    Chua, Tat-Seng
    Sun, Maosong
    [J]. AI OPEN, 2024, 5 : 30 - 38
  • [36] Fine-tuning large neural language models for biomedical natural language processing
    Tinn, Robert
    Cheng, Hao
    Gu, Yu
    Usuyama, Naoto
    Liu, Xiaodong
    Naumann, Tristan
    Gao, Jianfeng
    Poon, Hoifung
    [J]. PATTERNS, 2023, 4 (04):
  • [37] InstructBLIP: Towards General-purpose Vision-Language Models with Instruction Tuning
    Dai, Wenliang
    Li, Junnan
    Li, Dongxu
    Tiong, Anthony Meng Huat
    Zhao, Junqi
    Wang, Weisheng
    Li, Boyang
    Fung, Pascale
    Hoi, Steven
    [J]. ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 36 (NEURIPS 2023), 2023,
  • [38] Learning to Prompt for Vision-Language Models
    Kaiyang Zhou
    Jingkang Yang
    Chen Change Loy
    Ziwei Liu
    [J]. International Journal of Computer Vision, 2022, 130 : 2337 - 2348
  • [39] Learning to Prompt for Vision-Language Models
    Zhou, Kaiyang
    Yang, Jingkang
    Loy, Chen Change
    Liu, Ziwei
    [J]. INTERNATIONAL JOURNAL OF COMPUTER VISION, 2022, 130 (09) : 2337 - 2348
  • [40] VISION-LANGUAGE MODELS AS SUCCESS DETECTORS
    Du, Yuqing
    Konyushkova, Ksenia
    Denil, Misha
    Raju, Akhil
    Landon, Jessica
    Hill, Felix
    de Freitas, Nando
    Cabi, Serkan
    [J]. CONFERENCE ON LIFELONG LEARNING AGENTS, VOL 232, 2023, 232 : 120 - 136