Learning to Learn Better Visual Prompts

被引：0

作者：

Wang, Fengxiang ^{[1
]}

Huang, Wanrong ^{[1
]}

Yang, Shaowu ^{[1
]}

Qi, Fan ^{[2
]}

Lan, Long ^{[1
]}

机构：

[1] Natl Univ Def Technol, Coll Comp Sci & Technol, HPCL, Changsha, Hunan, Peoples R China

[2] Hong Kong Univ Sci & Technol, Hong Kong, Peoples R China

来源：

THIRTY-EIGHTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 38 NO 6 | 2024年

基金：

中国国家自然科学基金;

关键词：

D O I：

暂无

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Prompt tuning provides a low-cost way of adapting vision-language models (VLMs) for various downstream vision tasks without requiring updating the huge pre-trained parameters. Dispensing with the conventional manual crafting of prompts, the recent prompt tuning method of Context Optimization (CoOp) introduces adaptable vectors as text prompts. Nevertheless, several previous works point out that the CoOp-based approaches are easy to overfit to the base classes and hard to generalize to novel classes. In this paper, we reckon that the prompt tuning works well only in the base classes because of the limited capacity of the adaptable vectors. In addition, the scale of the pre-trained model is a hundred times the scale of the adaptable vector, thus the learned vector has a very limited ability to absorb the knowledge of novel classes. To minimize this excessive overfitting of textual knowledge on the base class, we view prompt tuning as learning to learn (LoL) and learn the prompt in the way of meta-learning, the training manner of dividing the base classes into many different subclasses could fully exert the limited capacity of prompt tuning and thus transfer its power to recognize the novel classes. To be specific, we initially perform fine-tuning on the base class based on the CoOp method for pre-trained CLIP. Subsequently, predicated on the fine-tuned CLIP model, we carry out further fine-tuning in an N-way K-shot manner from the perspective of meta-learning on the base classes. We finally apply the learned textual vector and VLM for unseen classes. Extensive experiments on benchmark datasets validate the efficacy of our meta-learning-informed prompt tuning, affirming its role as a robust optimization strategy for VLMs.

引用

页码：5354 / 5363

页数：10

共 50 条

[21] Better Learn to Live with It
Shorewood, Jackson
TECHNOLOGY REVIEW, 2016, 119 (04) : 8 - 8
[22] BE A BETTER TEACHER: LEARN HOW STUDENTS LEARN
Gregory Lee
山东外语教学, 1993, (03) : 77 - 79
[23] Learn from each other to Classify better: Cross-layer mutual attention learning for fine-grained visual classification
Liu, Dichao
Zhao, Longjiao
Wang, Yu
Kato, Jien
PATTERN RECOGNITION, 2023, 140
[24] ACCIDENT PROMPTS BETTER MINE SAFETY RULES
MACKAY, BB
MANAGEMENT OF WORLD WASTES, 1984, 27 (02): : 36 - 37
[25] Machines Learn Better with Better Data Ontology: Lessons from Philosophy of Induction and Machine Learning Practice
Li, Dan
MINDS AND MACHINES, 2023, 33 (03) : 429 - 450
[26] Machines Learn Better with Better Data Ontology: Lessons from Philosophy of Induction and Machine Learning Practice
Dan Li
Minds and Machines, 2023, 33 : 429 - 450
[27] PromptFL Let Federated Participants Cooperatively Learn Prompts Instead of Models - Federated Learning in Age of Foundation Model
Guo, Tao
Guo, Song
Wang, Junxiao
Tang, Xueyang
Xu, Wenchao
IEEE TRANSACTIONS ON MOBILE COMPUTING, 2024, 23 (05) : 5179 - 5194
[28] A hybrid learning approach for better recognition of visual objects
Imam, IF
Gutta, S
PROCEEDINGS OF THE THIRTEENTH NATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND THE EIGHTH INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE CONFERENCE, VOLS 1 AND 2, 1996, : 1104 - 1109
[29] Learning Better Visual Dialog Agents with Pretrained Visual-Linguistic Representation
Tu, Tao
Ping, Qing
Thattai, Govindarajan
Tur, Gokhan
Natarajan, Prem
2021 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR 2021, 2021, : 5618 - 5627
[30] Testing Prepares Students to Learn Better: The Forward Effect of Testing in Category Learning
Lee, Hee Seung
Ahn, Dahwi
JOURNAL OF EDUCATIONAL PSYCHOLOGY, 2018, 110 (02) : 203 - 217

← 1 2 3 4 5 →