Weak Distribution Detectors Lead to Stronger Generalizability of Vision-Language Prompt Tuning

被引:0
|
作者
Ding, Kun [1 ,3 ]
Zhang, Haojian [2 ]
Yu, Qiang [3 ]
Wang, Ying [1 ,3 ]
Xiang, Shiming [1 ,3 ]
Pan, Chunhong [3 ]
机构
[1] Chinese Acad Sci, Inst Automat, State Key Lab Multimodal Artificial Intelligence, Beijing, Peoples R China
[2] Chinese Acad Sci, Inst Automat, Engn Lab Intelligent Ind Vis, Beijing, Peoples R China
[3] Chinese Acad Sci, Inst Automat, Res Ctr Aerosp Informat, Beijing, Peoples R China
基金
中国国家自然科学基金;
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
We propose a generalized method for boosting the generalization ability of pre-trained vision-language models (VLMs) while fine-tuning on downstream few-shot tasks. The idea is realized by exploiting out-of-distribution (OOD) detection to predict whether a sample belongs to a base distribution or a novel distribution and then using the score generated by a dedicated competition based scoring function to fuse the zero-shot and few-shot classifier. The fused classifier is dynamic, which will bias towards the zero-shot classifier if a sample is more likely from the distribution pre-trained on, leading to improved base-to-novel generalization ability. Our method is performed only in test stage, which is applicable to boost existing methods without time-consuming retraining. Extensive experiments show that even weak distribution detectors can still improve VLMs' generalization ability. Specifically, with the help of OOD detectors, the harmonic mean of CoOp (Zhou et al. 2022b) and ProGrad (Zhu et al. 2023) increase by 2.6 and 1.5 percentage points over 11 recognition datasets in the base-to-novel setting.
引用
收藏
页码:1528 / 1536
页数:9
相关论文
共 50 条
  • [1] Distribution-Aware Prompt Tuning for Vision-Language Models
    Cho, Eulrang
    Kim, Jooyeon
    Kim, Hyunwoo J.
    [J]. 2023 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2023), 2023, : 21947 - 21956
  • [2] Understanding and Mitigating Overfitting in Prompt Tuning for Vision-Language Models
    Ma, Chengcheng
    Liu, Yang
    Deng, Jiankang
    Xie, Lingxi
    Dong, Weiming
    Xu, Changsheng
    [J]. IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2023, 33 (09) : 4616 - 4629
  • [3] Knowledge-Aware Prompt Tuning for Generalizable Vision-Language Models
    Kan, Baoshuo
    Wang, Teng
    Lu, Wenpeng
    Zhen, Xiantong
    Guan, Weili
    Zheng, Feng
    [J]. 2023 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2023), 2023, : 15624 - 15634
  • [4] Why Is Prompt Tuning for Vision-Language Models Robust to Noisy Labels?
    Wu, Cheng-En
    Tian, Yu
    Yu, Haichao
    Wang, Heng
    Morgado, Pedro
    Hu, Yu Hen
    Yang, Linjie
    [J]. 2023 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2023), 2023, : 15442 - 15451
  • [5] Black-box Prompt Tuning for Vision-Language Model as a Service
    Yu, Lang
    Chen, Qin
    Lin, Jiaju
    He, Liang
    [J]. PROCEEDINGS OF THE THIRTY-SECOND INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, IJCAI 2023, 2023, : 1686 - 1694
  • [6] MixPrompt: Enhancing Generalizability and Adversarial Robustness for Vision-Language Models via Prompt Fusion
    Fan, Hao
    Ma, Zhaoyang
    Li, Yong
    Tian, Rui
    Chen, Yunli
    Gao, Chenlong
    [J]. ADVANCED INTELLIGENT COMPUTING TECHNOLOGY AND APPLICATIONS, PT IX, ICIC 2024, 2024, 14870 : 328 - 339
  • [7] Distilling Large Vision-Language Model with Out-of-Distribution Generalizability
    Li, Xuanlin
    Fang, Yunhao
    Liu, Minghua
    Ling, Zhan
    Tu, Zhuowen
    Su, Hao
    [J]. 2023 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION, ICCV, 2023, : 2492 - 2503
  • [8] Learning to Prompt for Vision-Language Models
    Kaiyang Zhou
    Jingkang Yang
    Chen Change Loy
    Ziwei Liu
    [J]. International Journal of Computer Vision, 2022, 130 : 2337 - 2348
  • [9] Learning to Prompt for Vision-Language Models
    Zhou, Kaiyang
    Yang, Jingkang
    Loy, Chen Change
    Liu, Ziwei
    [J]. INTERNATIONAL JOURNAL OF COMPUTER VISION, 2022, 130 (09) : 2337 - 2348
  • [10] Dual Modality Prompt Tuning for Vision-Language Pre-Trained Model
    Xing, Yinghui
    Wu, Qirui
    Cheng, De
    Zhang, Shizhou
    Liang, Guoqiang
    Wang, Peng
    Zhang, Yanning
    [J]. IEEE TRANSACTIONS ON MULTIMEDIA, 2024, 26 : 2056 - 2068