Understanding and Mitigating Overfitting in Prompt Tuning for Vision-Language Models

被引:10
|
作者
Ma, Chengcheng [1 ,2 ]
Liu, Yang [3 ]
Deng, Jiankang [4 ]
Xie, Lingxi [4 ]
Dong, Weiming [1 ]
Xu, Changsheng [1 ]
机构
[1] Chinese Acad Sci CASIA, Inst Automat, Natl Lab Pattern Recognit NLPR, Beijing 100190, Peoples R China
[2] Univ Chinese Acad Sci UCAS, Sch Artificial Intelligence, Beijing 100049, Peoples R China
[3] Alibaba DAMO Acad, Hangzhou 310024, Peoples R China
[4] Huawei Inc, Shenzhen 518129, Peoples R China
基金
北京市自然科学基金; 美国国家科学基金会;
关键词
Vision-language model; prompt tuning; over-fitting; subspace learning; gradient projection;
D O I
10.1109/TCSVT.2023.3245584
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
Pretrained vision-language models (VLMs) such as CLIP have shown impressive generalization capability in downstream vision tasks with appropriate text prompts. Instead of designing prompts manually, Context Optimization (CoOp) has been recently proposed to learn continuous prompts using task-specific training data. Despite the performance improvements on downstream tasks, several studies have reported that CoOp suffers from the overfitting issue in two aspects: (i) the test accuracy on base classes first improves and then worsens during training; (ii) the test accuracy on novel classes keeps decreasing. However, none of the existing studies can understand and mitigate such overfitting problems. In this study, we first explore the cause of overfitting by analyzing the gradient flow. Comparative experiments reveal that CoOp favors generalizable and spurious features in the early and later training stages, respectively, leading to the non-overfitting and overfitting phenomena. Given those observations, we propose Subspace Prompt Tuning (Sub PT) to project the gradients in back-propagation onto the low-rank subspace spanned by the early-stage gradient flow eigenvectors during the entire training process and successfully eliminate the overfitting problem. In addition, we equip CoOp with a Novel Feature Learner (NFL) to enhance the generalization ability of the learned prompts onto novel categories beyond the training set, needless of image training data. Extensive experiments on 11 classification datasets demonstrate that Sub PT+NFL consistently boost the performance of CoOp and outperform the state-of-the-art CoCoOp approach. Experiments on more challenging vision downstream tasks, including open-vocabulary object detection and zero-shot semantic segmentation, also verify the effectiveness of the proposed method. Codes can be found at https://tinyurl.com/mpe64f89.
引用
下载
收藏
页码:4616 / 4629
页数:14
相关论文
共 50 条
  • [1] Distribution-Aware Prompt Tuning for Vision-Language Models
    Cho, Eulrang
    Kim, Jooyeon
    Kim, Hyunwoo J.
    2023 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2023), 2023, : 21947 - 21956
  • [2] Learning to Prompt for Vision-Language Models
    Kaiyang Zhou
    Jingkang Yang
    Chen Change Loy
    Ziwei Liu
    International Journal of Computer Vision, 2022, 130 : 2337 - 2348
  • [3] Knowledge-Aware Prompt Tuning for Generalizable Vision-Language Models
    Kan, Baoshuo
    Wang, Teng
    Lu, Wenpeng
    Zhen, Xiantong
    Guan, Weili
    Zheng, Feng
    2023 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2023), 2023, : 15624 - 15634
  • [4] Learning to Prompt for Vision-Language Models
    Zhou, Kaiyang
    Yang, Jingkang
    Loy, Chen Change
    Liu, Ziwei
    INTERNATIONAL JOURNAL OF COMPUTER VISION, 2022, 130 (09) : 2337 - 2348
  • [5] Why Is Prompt Tuning for Vision-Language Models Robust to Noisy Labels?
    Wu, Cheng-En
    Tian, Yu
    Yu, Haichao
    Wang, Heng
    Morgado, Pedro
    Hu, Yu Hen
    Yang, Linjie
    2023 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2023), 2023, : 15442 - 15451
  • [6] CTPT: Continual Test-time Prompt Tuning for vision-language models
    Wang, Fan
    Han, Zhongyi
    Liu, Xingbo
    Yin, Yilong
    Gao, Xin
    Pattern Recognition, 2025, 161
  • [7] CPT: Colorful Prompt Tuning for pre-trained vision-language models
    Yao, Yuan
    Zhang, Ao
    Zhang, Zhengyan
    Liu, Zhiyuan
    Chua, Tat-Seng
    Sun, Maosong
    AI OPEN, 2024, 5 : 30 - 38
  • [8] Conditional Prompt Learning for Vision-Language Models
    Zhou, Kaiyang
    Yang, Jingkang
    Loy, Chen Change
    Liu, Ziwei
    2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2022), 2022, : 16795 - 16804
  • [9] Prompt-Ladder: Memory-efficient prompt tuning for vision-language models on edge devices
    Cai, Siqi
    Liu, Xuan
    Yuan, Jingling
    Zhou, Qihua
    Pattern Recognition, 2025, 163
  • [10] CoPL: Contextual Prompt Learning for Vision-Language Understanding
    Goswami, Koustava
    Karanam, Srikrishna
    Udhayanan, Prateksha
    Joseph, K. J.
    Srinivasan, Balaji Vasan
    THIRTY-EIGHTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 38 NO 16, 2024, : 18090 - 18098