JoAPR: Cleaning the Lens of Prompt Learning for Vision-Language Models

被引：0

作者：

Guo, Yuncheng ^{[1
]}

Guo, Xiaodong ^{[1
]}

机构：

[1] Fudan Univ, Dept Elect Engn, Shanghai 200438, Peoples R China

来源：

2024 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR) | 2024年

基金：

中国国家自然科学基金;

关键词：

D O I：

10.1109/CVPR52733.2024.02711

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Leveraging few-shot datasets in prompt learning for Vision-Language Models eliminates the need for manual prompt engineering while highlighting the necessity of accurate annotations for the labels. However, high-level or complex label noise challenges prompt learning for Vision-Language Models. Aiming at this issue, we propose a new framework for improving its robustness. Specifically, we introduce the Joint Adaptive Partitioning for Label Refurbishment (JoAPR), a structured framework encompassing two key steps. 1) Data Partitioning, where we differentiate between clean and noisy data using joint adaptive thresholds. 2) Label Refurbishment, where we correct the labels based on the partition outcomes before retraining the network. Our comprehensive experiments confirm that JoAPR substantially enhances the robustness of prompt learning for Vision-Language Models against label noise, offering a promising direction for future research.

引用

页码：28695 / 28705

页数：11

共 50 条

[41] A survey of efficient fine-tuning methods for Vision-Language Models - Prompt and Adapter
Xing, Jialu
Liu, Jianping
Wang, Jian
Sun, Lulu
Chen, Xi
Gu, Xunxun
Wang, Yingfei
COMPUTERS & GRAPHICS-UK, 2024, 119
[42] Vision-Language Models for Vision Tasks: A Survey
Zhang, Jingyi
Huang, Jiaxing
Jin, Sheng
Lu, Shijian
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2024, 46 (08) : 5625 - 5644
[43] GalLoP: Learning Global and Local Prompts for Vision-Language Models
Lafon, Marc
Ramzi, Elias
Rambour, Clement
Audebert, Nicolas
Thome, Nicolas
COMPUTER VISION - ECCV 2024, PT LXI, 2025, 15119 : 264 - 282
[44] Adapting Vision-Language Models via Learning to Inject Knowledge
Xuan, Shiyu
Yang, Ming
Zhang, Shiliang
IEEE TRANSACTIONS ON IMAGE PROCESSING, 2024, 33 : 5798 - 5809
[45] Visual In-Context Learning for Large Vision-Language Models
Zhou, Yucheng
Le, Xiang
Wang, Qianning
Shen, Jianbing
FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS: ACL 2024, 2024, : 15890 - 15902
[46] Learning the Visualness of Text Using Large Vision-Language Models
Verma, Gaurav
Rossi, Ryan A.
Tensmeyer, Christopher
Gu, Jiuxiang
Nenkova, Ani
2023 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING, EMNLP 2023, 2023, : 2394 - 2408
[47] MixPrompt: Enhancing Generalizability and Adversarial Robustness for Vision-Language Models via Prompt Fusion
Fan, Hao
Ma, Zhaoyang
Li, Yong
Tian, Rui
Chen, Yunli
Gao, Chenlong
ADVANCED INTELLIGENT COMPUTING TECHNOLOGY AND APPLICATIONS, PT IX, ICIC 2024, 2024, 14870 : 328 - 339
[48] Multi-task prompt tuning with soft context sharing for vision-language models
Ding, Kun
Wang, Ying
Liu, Pengzhang
Yu, Qiang
Zhang, Haojian
Xiang, Shiming
Pan, Chunhong
NEUROCOMPUTING, 2024, 603
[49] CSP-DCPE: Category-Specific Prompt with Deep Contextual Prompt Enhancement for Vision-Language Models
Wu, Chunlei
Wu, Yixiang
Xu, Qinfu
Zi, Xuebin
ELECTRONICS, 2025, 14 (04):
[50] Read-only Prompt Optimization for Vision-Language Few-shot Learning
Lee, Dongjun
Song, Seokwon
Suh, Jihee
Choi, Joonmyeong
Lee, Sanghyeok
Kim, Hyunwoo J.
2023 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION, ICCV, 2023, : 1401 - 1411

← 1 2 3 4 5 →