Black Box Few-Shot Adaptation for Vision-Language models

被引:5
|
作者
Ouali, Yassine [1 ]
Bulat, Adrian [1 ]
Matinez, Brais [1 ]
Tzimiropoulos, Georgios [1 ,2 ]
机构
[1] Samsung AI Cambridge, Cambridge, England
[2] Queen Mary Univ London, London, England
关键词
SHAPE;
D O I
10.1109/ICCV51070.2023.01424
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Vision-Language (V-L) models trained with contrastive learning to align the visual and language modalities have been shown to be strong few-shot learners. Soft prompt learning is the method of choice for few-shot downstream adaption aiming to bridge the modality gap caused by the distribution shift induced by the new domain. While parameter-efficient, prompt learning still requires access to the model weights and can be computationally infeasible for large models with billions of parameters. To address these shortcomings, in this work, we describe a blackbox method for V-L few-shot adaptation that (a) operates on pre-computed image and text features and hence works without access to the model's weights, (b) it is orders of magnitude faster at training time, (c) it is amenable to both supervised and unsupervised training, and (d) it can be even used to align image and text features computed from uni- modal models. To achieve this, we propose Linear Feature Alignment ( LFA), a simple linear approach for V-L re-alignment in the target domain. LFA is initialized from a closed-form solution to a least-squares problem and then it is iteratively updated by minimizing a re-ranking loss. Despite its simplicity, our approach can even surpass soft-prompt learning methods as shown by extensive experiments on 11 image and 2 video datasets. Code available at: https://github.com/saic-fi/LFA
引用
收藏
页码:15488 / 15500
页数:13
相关论文
共 50 条
  • [41] Black-box Prompt Tuning for Vision-Language Model as a Service
    Yu, Lang
    Chen, Qin
    Lin, Jiaju
    He, Liang
    PROCEEDINGS OF THE THIRTY-SECOND INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, IJCAI 2023, 2023, : 1686 - 1694
  • [42] Label Propagation for Zero-shot Classification with Vision-Language Models
    Stojnic, Vladan
    Kalantidis, Yannis
    Tolias, Giorgos
    2024 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2024, : 23209 - 23218
  • [43] FILP-3D: Enhancing 3D few-shot class-incremental learning with pre-trained vision-language models
    Xu, Wan
    Huang, Tianyu
    Qu, Tianyuan
    Yang, Guanglei
    Guo, Yiwen
    Zuo, Wangmeng
    PATTERN RECOGNITION, 2025, 165
  • [44] WAVPROMPT: Towards Few-Shot Spoken Language Understanding with Frozen Language Models
    Gao, Heting
    Ni, Junrui
    Qian, Kaizhi
    Zhang, Yang
    Chang, Shiyu
    Hasegawa-Johnson, Mark
    INTERSPEECH 2022, 2022, : 2738 - 2742
  • [45] Few-shot adaptation of multi-modal foundation models: a survey
    Liu, Fan
    Zhang, Tianshu
    Dai, Wenwen
    Zhang, Chuanyi
    Cai, Wenwen
    Zhou, Xiaocong
    Chen, Delong
    ARTIFICIAL INTELLIGENCE REVIEW, 2024, 57 (10)
  • [46] Pre-trained Vision and Language Transformers Are Few-Shot Incremental Learners
    Park, Keon-Hee
    Song, Kyungwoo
    Park, Gyeong-Moon
    2024 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2024, : 23881 - 23890
  • [47] Fairness-guided Few-shot Prompting for Large Language Models
    Ma, Huan
    Zhang, Changqing
    Bian, Yatao
    Liu, Lemao
    Zhang, Zhirui
    Zhao, Peilin
    Zhang, Shu
    Fu, Huazhu
    Hu, Qinghua
    Wu, Bingzhe
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 36 (NEURIPS 2023), 2023,
  • [48] LLaFS: When Large Language Models Meet Few-Shot Segmentation
    Zhu, Lanyun
    Chen, Tianrun
    Ji, Deyi
    Ye, Jieping
    Liu, Jun
    2024 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR 2024, 2024, : 3065 - 3075
  • [49] Political Bias of Large Language Models in Few-Shot News Summarization
    Onishi, Takeshi
    Caverlee, James
    ADVANCES IN BIAS AND FAIRNESS IN INFORMATION RETRIEVAL, BIAS 2024, 2025, 2227 : 32 - 45
  • [50] Adapting Language-Audio Models as Few-Shot Audio Learners
    Liang, Jinhua
    Liu, Xubo
    Liu, Haohe
    Phan, Huy
    Benetos, Emmanouil
    Plumbley, Mark D.
    Wang, Wenwu
    INTERSPEECH 2023, 2023, : 276 - 280