Visual-Language Prompt Tuning with Knowledge-guided Context Optimization

被引:59
|
作者
Yao, Hantao [1 ]
Zhang, Rui [2 ]
Xu, Changsheng [1 ,3 ]
机构
[1] Chinese Acad Sci, Inst Automat, State Key Lab Multimodal Artificial Intelligence, Beijing, Peoples R China
[2] Chinese Acad Sci, State Key Lab Processors, Inst Comp Technol, Beijing, Peoples R China
[3] Univ Chinese Acad Sci, Beijing, Peoples R China
基金
中国国家自然科学基金; 北京市自然科学基金;
关键词
D O I
10.1109/CVPR52729.2023.00653
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Prompt tuning is an effective way to adapt the pretrained visual-language model (VLM) to the downstream task using task-related textual tokens. Representative CoOp-based work combines the learnable textual tokens with the class tokens to obtain specific textual knowledge. However, the specific textual knowledge is worse generalization to the unseen classes because it forgets the essential general textual knowledge having a strong generalization ability. To tackle this issue, we introduce a novel Knowledge-guided Context Optimization (KgCoOp) to enhance the generalization ability of the learnable prompt for unseen classes. The key insight of KgCoOp is that the forgetting about essential knowledge can be alleviated by reducing the discrepancy between the learnable prompt and the hand-crafted prompt. Especially, KgCoOp minimizes the discrepancy between the textual embeddings generated by learned prompts and the hand-crafted prompts. Finally, adding the KgCoOp upon the contrastive loss can make a discriminative prompt for both seen and unseen tasks. Extensive evaluation of several benchmarks demonstrates that the proposed Knowledge-guided Context Optimization is an efficient method for prompt tuning, i.e., achieves better performance with less training time. code.
引用
收藏
页码:6757 / 6767
页数:11
相关论文
共 50 条
  • [31] Knowledge-guided Evolutionary Optimization for Large-Scale Air Defense Resource Allocation
    Li W.
    Wang R.
    Heng Y.
    Zhang T.
    Wang L.
    IEEE Transactions on Artificial Intelligence, 2024, 5 (12): : 1 - 13
  • [32] Knowledge-Guided Optimization for Complex Vehicle Routing with 3D Loading Constraints
    Zhang, Han
    Li, Qing
    Yao, Xin
    PARALLEL PROBLEM SOLVING FROM NATURE-PPSN XVIII, PPSN 2024, PT I, 2024, 15148 : 133 - 148
  • [33] Prompt engineering for zero-shot and few-shot defect detection and classification using a visual-language pretrained model
    Yong, Gunwoo
    Jeon, Kahyun
    Gil, Daeyoung
    Lee, Ghang
    COMPUTER-AIDED CIVIL AND INFRASTRUCTURE ENGINEERING, 2023, 38 (11) : 1536 - 1554
  • [34] PatentMiner: Patent Vacancy Mining via Context-Enhanced and Knowledge-Guided Graph Attention
    Wu, Gaochen
    Xu, Bin
    Qin, Yuxin
    Kong, Fei
    Liu, Bangchang
    Zhao, Hongwen
    Chang, Dejie
    KNOWLEDGE GRAPH AND SEMANTIC COMPUTING: KNOWLEDGE GRAPH EMPOWERS NEW INFRASTRUCTURE CONSTRUCTION, 2021, 1466 : 227 - 239
  • [35] Few-shot biomedical named entity recognition via knowledge-guided instance generation and prompt contrastive learning
    Chen, Peng
    Wang, Jian
    Lin, Hongfei
    Zhao, Di
    Yang, Zhihao
    Wren, Jonathan
    BIOINFORMATICS, 2023, 39 (08)
  • [36] HybridPrompt: Bridging Language Models and Human Priors in Prompt Tuning for Visual Question Answering
    Ma, Zhiyuan
    Yu, Zhihuan
    Li, Jianjun
    Li, Guohui
    THIRTY-SEVENTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 37 NO 11, 2023, : 13371 - 13379
  • [37] Knowledge-guided pre-training and fine-tuning: Video representation learning for action recognition
    Wang, Guanhong
    Zhou, Yang
    He, Zhanhao
    Lu, Keyu
    Feng, Yang
    Liu, Zuozhu
    Wang, Gaoang
    NEUROCOMPUTING, 2024, 571
  • [38] Knowledge-guided prompt-based continual learning: Aligning task-prompts through contrastive hard negatives
    Lu, Heng-yang
    Lin, Long-kang
    Fan, Chenyou
    Wang, Chongjun
    Fang, Wei
    Wu, Xiao-jun
    KNOWLEDGE-BASED SYSTEMS, 2025, 310
  • [39] Robust Visual Recognition in Poor Visibility Conditions: A Prior Knowledge-Guided Adversarial Learning Approach
    Yang, Jiangang
    Yang, Jianfei
    Luo, Luqing
    Wang, Yun
    Wang, Shizheng
    Liu, Jian
    ELECTRONICS, 2023, 12 (17)
  • [40] Prompt-RSVQA: Prompting visual context to a language model for Remote Sensing Visual Question Answering
    Chappuis, Christel
    Zermatten, Valerie
    Lobry, Sylvain
    Le Saux, Bertrand
    Tuia, Devis
    2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION WORKSHOPS, CVPRW 2022, 2022, : 1371 - 1380