GalLoP: Learning Global and Local Prompts for Vision-Language Models

被引:0
|
作者
Lafon, Marc [1 ]
Ramzi, Elias [1 ]
Rambour, Clement [1 ]
Audebert, Nicolas [1 ,2 ]
Thome, Nicolas [3 ]
机构
[1] Conservatoire Natl Arts & Metiers, CEDRIC, F-75141 Paris, France
[2] Univ Gustave Eiffel, IGN, LASTIG, ENSG, F-94160 St Mande, France
[3] Sorbonne Univ, CNRS, ISIR, F-75005 Paris, France
来源
关键词
Vision-language models; Few shot classification; Prompt learning; Local and global prompts; Robustness; OOD detection;
D O I
10.1007/978-3-031-73030-6_15
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Prompt learning has been widely adopted to efficiently adapt vision-language models (VLMs), e.g. CLIP, for few-shot image classification. Despite their success, most prompt learning methods trade-off between classification accuracy and robustness, e.g. in domain generalization or out-of-distribution (OOD) detection. In this work, we introduce Global-Local Prompts (GalLoP), a new prompt learning method that learns multiple diverse prompts leveraging both global and local visual features. The training of the local prompts relies on local features with an enhanced vision-text alignment. To focus only on pertinent features, this local alignment is coupled with a sparsity strategy in the selection of the local features. We enforce diversity on the set of prompts using a new "prompt dropout" technique and a multiscale strategy on the local prompts. GalLoP outperforms previous prompt learning methods on accuracy on eleven datasets in different few shots settings and with various backbones. Furthermore, GalLoP shows strong robustness performances in both domain generalization and OOD detection, even outperforming dedicated OOD detection methods. Code and instructions to reproduce our results will be open-sourced.
引用
收藏
页码:264 / 282
页数:19
相关论文
共 50 条
  • [31] Adversarial Prompt Tuning for Vision-Language Models
    Zhang, Jiaming
    Ma, Xingjun
    Wang, Xin
    Qiu, Lingyu
    Wang, Jiaqi
    Jiang, Yu-Gang
    Sang, Jitao
    COMPUTER VISION - ECCV 2024, PT XLV, 2025, 15103 : 56 - 72
  • [32] Task Bias in Contrastive Vision-Language Models
    Menon, Sachit
    Chandratreya, Ishaan Preetam
    Vondrick, Carl
    INTERNATIONAL JOURNAL OF COMPUTER VISION, 2024, 132 (06) : 2026 - 2040
  • [33] Task Residual for Tuning Vision-Language Models
    Yu, Tao
    Lu, Zhihe
    Jin, Xin
    Chen, Zhibo
    Wang, Xinchao
    2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2023, : 10899 - 10909
  • [34] Perceptual Grouping in Contrastive Vision-Language Models
    Ranasinghe, Kanchana
    McKinzie, Brandon
    Ravi, Sachin
    Yang, Yinfei
    Toshev, Alexander
    Shlens, Jonathon
    2023 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION, ICCV, 2023, : 5548 - 5561
  • [35] Adventures of Trustworthy Vision-Language Models: A Survey
    Vatsa, Mayank
    Jain, Anubhooti
    Singh, Richa
    THIRTY-EIGHTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 38 NO 20, 2024, : 22650 - 22658
  • [36] Equivariant Similarity for Vision-Language Foundation Models
    Wang, Tan
    Lin, Kevin
    Li, Linjie
    Lin, Chung-Ching
    Yang, Zhengyuan
    Zhang, Hanwang
    Liu, Zicheng
    Wang, Lijuan
    2023 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2023), 2023, : 11964 - 11974
  • [37] Leveraging vision-language prompts for real-world image restoration and enhancement
    Wei, Yanyan
    Zhang, Yilin
    Li, Kun
    Wang, Fei
    Tang, Shengeng
    Zhang, Zhao
    COMPUTER VISION AND IMAGE UNDERSTANDING, 2025, 250
  • [38] Federated fine-grained prompts for vision-language models based on open-vocabulary object detection
    Li, Yu
    APPLIED INTELLIGENCE, 2025, 55 (07)
  • [39] Towards Better Vision-Inspired Vision-Language Models
    Cao, Yun-Hao
    Ji, Kaixiang
    Huang, Ziyuan
    Zheng, Chuanyang
    Liu, Jiajia
    Wang, Jian
    Chen, Jingdong
    Yang, Ming
    2024 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2024, : 13537 - 13547
  • [40] Task-to-Instance Prompt Learning for Vision-Language Models at Test Time
    Lu, Zhihe
    Bai, Jiawang
    Li, Xin
    Xiao, Zeyu
    Wang, Xinchao
    IEEE TRANSACTIONS ON IMAGE PROCESSING, 2025, 34 : 1908 - 1920