Black Box Few-Shot Adaptation for Vision-Language models

被引:5
|
作者
Ouali, Yassine [1 ]
Bulat, Adrian [1 ]
Matinez, Brais [1 ]
Tzimiropoulos, Georgios [1 ,2 ]
机构
[1] Samsung AI Cambridge, Cambridge, England
[2] Queen Mary Univ London, London, England
关键词
SHAPE;
D O I
10.1109/ICCV51070.2023.01424
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Vision-Language (V-L) models trained with contrastive learning to align the visual and language modalities have been shown to be strong few-shot learners. Soft prompt learning is the method of choice for few-shot downstream adaption aiming to bridge the modality gap caused by the distribution shift induced by the new domain. While parameter-efficient, prompt learning still requires access to the model weights and can be computationally infeasible for large models with billions of parameters. To address these shortcomings, in this work, we describe a blackbox method for V-L few-shot adaptation that (a) operates on pre-computed image and text features and hence works without access to the model's weights, (b) it is orders of magnitude faster at training time, (c) it is amenable to both supervised and unsupervised training, and (d) it can be even used to align image and text features computed from uni- modal models. To achieve this, we propose Linear Feature Alignment ( LFA), a simple linear approach for V-L re-alignment in the target domain. LFA is initialized from a closed-form solution to a least-squares problem and then it is iteratively updated by minimizing a re-ranking loss. Despite its simplicity, our approach can even surpass soft-prompt learning methods as shown by extensive experiments on 11 image and 2 video datasets. Code available at: https://github.com/saic-fi/LFA
引用
收藏
页码:15488 / 15500
页数:13
相关论文
共 50 条
  • [31] Large Language Models for Few-Shot Automatic Term Extraction
    Banerjee, Shubhanker
    Chakravarthi, Bharathi Raja
    McCrae, John Philip
    NATURAL LANGUAGE PROCESSING AND INFORMATION SYSTEMS, PT I, NLDB 2024, 2024, 14762 : 137 - 150
  • [32] Large Language Models (LLMs) Enable Few-Shot Clustering
    Vijay, Viswanathan
    Kiril, Gashteovski
    Carolin, Lawrence
    Tongshuang, Wu
    Graham, Neubig
    NEC Technical Journal, 2024, 17 (02): : 80 - 90
  • [33] Unsupervised and few-shot parsing from pretrained language models
    Zeng, Zhiyuan
    Xiong, Deyi
    ARTIFICIAL INTELLIGENCE, 2022, 305
  • [34] Task Contamination: Language Models May Not Be Few-Shot Anymore
    Li, Changmao
    Flanigan, Jeffrey
    THIRTY-EIGHTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 38 NO 16, 2024, : 18471 - 18480
  • [35] ATLAS: Few-shot Learning with Retrieval Augmented Language Models
    Izacard, Gautier
    Lewis, Patrick
    Lomeli, Maria
    Hosseini, Lucas
    Petroni, Fabio
    Schick, Timo
    Dwivedi-Yu, Jane
    Joulin, Armand
    Riedel, Sebastian
    Grave, Edouard
    JOURNAL OF MACHINE LEARNING RESEARCH, 2023, 24
  • [36] Constrained Language Models Yield Few-Shot Semantic Parsers
    Shin, Richard
    Lin, Christopher H.
    Thomson, Sam
    Chen, Charles
    Roy, Subhro
    Platanios, Emmanouil Antonios
    Pauls, Adam
    Klein, Dan
    Eisner, Jason
    Van Durme, Benjamin
    2021 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING (EMNLP 2021), 2021, : 7699 - 7715
  • [37] Getting to Production with Few-shot Natural Language Generation Models
    Heidari, Peyman
    Einolghozati, Arash
    Jain, Shashank
    Batra, Soumya
    Callender, Lee
    Arun, Ankit
    Mei, Shawn
    Gupta, Sonal
    Donmez, Pinar
    Bhardwaj, Vikas
    Kumar, Anuj
    White, Michael
    SIGDIAL 2021: 22ND ANNUAL MEETING OF THE SPECIAL INTEREST GROUP ON DISCOURSE AND DIALOGUE (SIGDIAL 2021), 2021, : 66 - 76
  • [38] Learning Meta Soft Prompt for Few-Shot Language Models
    Chien, Jen-Tzung
    Chen, Ming-Yen
    Xue, Jing-Hao
    2023 ASIA PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE, APSIPA ASC, 2023, : 57 - 62
  • [39] Few-Shot Semantic Parsing with Language Models Trained on Code
    Shin, Richard
    Van Durme, Benjamin
    NAACL 2022: THE 2022 CONFERENCE OF THE NORTH AMERICAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS: HUMAN LANGUAGE TECHNOLOGIES, 2022, : 5417 - 5425
  • [40] Few-Shot Adversarial Domain Adaptation
    Motiian, Saeid
    Jones, Quinn
    Iranmanesh, Seyed Mehdi
    Doretto, Gianfranco
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 30 (NIPS 2017), 2017, 30