Visual-Language Prompt Tuning with Knowledge-guided Context Optimization

被引：59

作者：

Yao, Hantao ^{[1
]}

Zhang, Rui ^{[2
]}

Xu, Changsheng ^{[1
,3
]}

机构：

[1] Chinese Acad Sci, Inst Automat, State Key Lab Multimodal Artificial Intelligence, Beijing, Peoples R China

[2] Chinese Acad Sci, State Key Lab Processors, Inst Comp Technol, Beijing, Peoples R China

[3] Univ Chinese Acad Sci, Beijing, Peoples R China

来源：

2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR | 2023年

基金：

中国国家自然科学基金; 北京市自然科学基金;

关键词：

D O I：

10.1109/CVPR52729.2023.00653

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Prompt tuning is an effective way to adapt the pretrained visual-language model (VLM) to the downstream task using task-related textual tokens. Representative CoOp-based work combines the learnable textual tokens with the class tokens to obtain specific textual knowledge. However, the specific textual knowledge is worse generalization to the unseen classes because it forgets the essential general textual knowledge having a strong generalization ability. To tackle this issue, we introduce a novel Knowledge-guided Context Optimization (KgCoOp) to enhance the generalization ability of the learnable prompt for unseen classes. The key insight of KgCoOp is that the forgetting about essential knowledge can be alleviated by reducing the discrepancy between the learnable prompt and the hand-crafted prompt. Especially, KgCoOp minimizes the discrepancy between the textual embeddings generated by learned prompts and the hand-crafted prompts. Finally, adding the KgCoOp upon the contrastive loss can make a discriminative prompt for both seen and unseen tasks. Extensive evaluation of several benchmarks demonstrates that the proposed Knowledge-guided Context Optimization is an efficient method for prompt tuning, i.e., achieves better performance with less training time. code.

引用

页码：6757 / 6767

页数：11

共 50 条

[1] Enhancing Visual-Language Prompt Tuning Through Sparse Knowledge-Guided Context Optimization
Tian, Qiangxing
Zhang, Min
ENTROPY, 2025, 27 (03)
[2] VTPL: Visual and text prompt learning for visual-language models
Sun, Bo
Wu, Zhichao
Zhang, Hao
He, Jun
JOURNAL OF VISUAL COMMUNICATION AND IMAGE REPRESENTATION, 2024, 104
[3] TCP:Textual-based Class-aware Prompt tuning for Visual-Language Model
Yao, Hantao
Zhang, Rui
Xu, Changsheng
2024 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2024, : 23438 - 23448
[4] SDPT: Synchronous Dual Prompt Tuning for Fusion-Based Visual-Language Pre-trained Models
Zhou, Yang
Wu, Yongjian
Saiyin, Jiya
Wei, Bingzheng
Lai, Maode
Chang, Eric
Xu, Yan
COMPUTER VISION - ECCV 2024, PT XLIX, 2025, 15107 : 340 - 356
[5] CuTCP: Custom Text Generation-based Class-aware Prompt Tuning for visual-language models
Huang, Min
Yang, Chen
Yu, Xiaoyan
SCIENTIFIC REPORTS, 2025, 15 (01):
[6] Knowledge-Guided Prompt Learning for Tropical Cyclone Intensity Estimation
Li, Wenhui
Li, Yue
Zhou, Ying
Song, Dan
Zhang, Jing
Wei, Zhiqiang
Liu, An-An
IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, 2025, 63
[7] Knowledge-Guided Prompt Learning for Lifespan Brain MR Image Segmentation
Teng, Lin
Zhao, Zihao
Huang, Jiawei
Cao, Zehong
Meng, Runqi
Shi, Feng
Shen, Dinggang
MEDICAL IMAGE COMPUTING AND COMPUTER ASSISTED INTERVENTION - MICCAI 2024, PT II, 2024, 15002 : 238 - 248
[8] Knowledge-Guided Prompt Learning for Few-Shot Text Classification
Wang, Liangguo
Chen, Ruoyu
Li, Li
ELECTRONICS, 2023, 12 (06)
[9] Knowledge-Enhanced Visual-Language Pretraining for Computational Pathology
Zhou, Xiao
Zhang, Xiaoman
Wu, Chaoyi
Zhang, Ya
Xie, Weidi
Wang, Yanfeng
COMPUTER VISION - ECCV 2024, PT LII, 2025, 15110 : 345 - 362
[10] Knowledge assimilation: Implementing knowledge-guided agricultural large language model
Jiang, Jingchi
Yan, Lian
Liu, Haifeng
Xia, Zhenbo
Wang, Haotian
Yang, Yang
Guan, Yi
KNOWLEDGE-BASED SYSTEMS, 2025, 314

← 1 2 3 4 5 →