Visual-Language Prompt Tuning with Knowledge-guided Context Optimization

被引:59
|
作者
Yao, Hantao [1 ]
Zhang, Rui [2 ]
Xu, Changsheng [1 ,3 ]
机构
[1] Chinese Acad Sci, Inst Automat, State Key Lab Multimodal Artificial Intelligence, Beijing, Peoples R China
[2] Chinese Acad Sci, State Key Lab Processors, Inst Comp Technol, Beijing, Peoples R China
[3] Univ Chinese Acad Sci, Beijing, Peoples R China
基金
中国国家自然科学基金; 北京市自然科学基金;
关键词
D O I
10.1109/CVPR52729.2023.00653
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Prompt tuning is an effective way to adapt the pretrained visual-language model (VLM) to the downstream task using task-related textual tokens. Representative CoOp-based work combines the learnable textual tokens with the class tokens to obtain specific textual knowledge. However, the specific textual knowledge is worse generalization to the unseen classes because it forgets the essential general textual knowledge having a strong generalization ability. To tackle this issue, we introduce a novel Knowledge-guided Context Optimization (KgCoOp) to enhance the generalization ability of the learnable prompt for unseen classes. The key insight of KgCoOp is that the forgetting about essential knowledge can be alleviated by reducing the discrepancy between the learnable prompt and the hand-crafted prompt. Especially, KgCoOp minimizes the discrepancy between the textual embeddings generated by learned prompts and the hand-crafted prompts. Finally, adding the KgCoOp upon the contrastive loss can make a discriminative prompt for both seen and unseen tasks. Extensive evaluation of several benchmarks demonstrates that the proposed Knowledge-guided Context Optimization is an efficient method for prompt tuning, i.e., achieves better performance with less training time. code.
引用
收藏
页码:6757 / 6767
页数:11
相关论文
共 50 条
  • [1] Enhancing Visual-Language Prompt Tuning Through Sparse Knowledge-Guided Context Optimization
    Tian, Qiangxing
    Zhang, Min
    ENTROPY, 2025, 27 (03)
  • [2] VTPL: Visual and text prompt learning for visual-language models
    Sun, Bo
    Wu, Zhichao
    Zhang, Hao
    He, Jun
    JOURNAL OF VISUAL COMMUNICATION AND IMAGE REPRESENTATION, 2024, 104
  • [3] TCP:Textual-based Class-aware Prompt tuning for Visual-Language Model
    Yao, Hantao
    Zhang, Rui
    Xu, Changsheng
    2024 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2024, : 23438 - 23448
  • [4] SDPT: Synchronous Dual Prompt Tuning for Fusion-Based Visual-Language Pre-trained Models
    Zhou, Yang
    Wu, Yongjian
    Saiyin, Jiya
    Wei, Bingzheng
    Lai, Maode
    Chang, Eric
    Xu, Yan
    COMPUTER VISION - ECCV 2024, PT XLIX, 2025, 15107 : 340 - 356
  • [5] CuTCP: Custom Text Generation-based Class-aware Prompt Tuning for visual-language models
    Huang, Min
    Yang, Chen
    Yu, Xiaoyan
    SCIENTIFIC REPORTS, 2025, 15 (01):
  • [6] Knowledge-Guided Prompt Learning for Tropical Cyclone Intensity Estimation
    Li, Wenhui
    Li, Yue
    Zhou, Ying
    Song, Dan
    Zhang, Jing
    Wei, Zhiqiang
    Liu, An-An
    IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, 2025, 63
  • [7] Knowledge-Guided Prompt Learning for Lifespan Brain MR Image Segmentation
    Teng, Lin
    Zhao, Zihao
    Huang, Jiawei
    Cao, Zehong
    Meng, Runqi
    Shi, Feng
    Shen, Dinggang
    MEDICAL IMAGE COMPUTING AND COMPUTER ASSISTED INTERVENTION - MICCAI 2024, PT II, 2024, 15002 : 238 - 248
  • [8] Knowledge-Guided Prompt Learning for Few-Shot Text Classification
    Wang, Liangguo
    Chen, Ruoyu
    Li, Li
    ELECTRONICS, 2023, 12 (06)
  • [9] Knowledge-Enhanced Visual-Language Pretraining for Computational Pathology
    Zhou, Xiao
    Zhang, Xiaoman
    Wu, Chaoyi
    Zhang, Ya
    Xie, Weidi
    Wang, Yanfeng
    COMPUTER VISION - ECCV 2024, PT LII, 2025, 15110 : 345 - 362
  • [10] Knowledge assimilation: Implementing knowledge-guided agricultural large language model
    Jiang, Jingchi
    Yan, Lian
    Liu, Haifeng
    Xia, Zhenbo
    Wang, Haotian
    Yang, Yang
    Guan, Yi
    KNOWLEDGE-BASED SYSTEMS, 2025, 314