Prompt-Ladder: Memory-efficient prompt tuning for vision-language models on edge devices

被引：0

作者：

Cai, Siqi ^{[1
]}

Liu, Xuan ^{[2
]}

Yuan, Jingling ^{[1
]}

Zhou, Qihua ^{[3
]}

机构：

[1] School of Computer Science and Artificial Intelligence, Wuhan University of Technology, Wuhan, China

[2] Department of Electronic and Information Engineering, Hong Kong Polytechnic University, Hong Kong

[3] School of Computer Science and Software Engineering, Shenzhen University, Shenzhen, China

来源：

Pattern Recognition | 2025年 / 163卷

基金：

中国国家自然科学基金;

关键词：

Ladders - Semantics - Transfer learning - Visual languages;

D O I：

10.1016/j.patcog.2025.111460

中图分类号：

学科分类号：

摘要：

The pre-trained vision-language models (VLMs) have been the foundation for diverse intelligent services in human life. Common VLMs hold large parameter scales and require heavy memory overhead for model pre-training, which poses challenges in adapting them to edge devices. To enable memory-efficient VLMs, previous works mainly focus on the prompt engineering technique that utilizes trainable soft prompts instead of manually designing hard prompts. However, to update fewer than 3% of prompt parameters, these studies still require the back-propagation chain to traverse pre-trained models with extensive parameters. Consequently, the intermediate activation variables and gradients occupy a significant amount of memory resources, greatly hindering their adaptation on resource-constrained edge devices. In view of the above, we propose a memory-efficient prompt-tuning method, named Prompt-Ladder. Our main idea is to adopt a lightweight ladder network as an agent to bypass VLMs during back-propagation for the parameter optimization of the designed multi-model prompt module. The ladder network fuses the intermediate output of VLMs as a guide and selects important parameters of VLMs to initialize for the maintenance of model performance. We also share parameters of the ladder network between text and image data to obtain a more semantically aligned representation across modalities for the optimization of the prompt module. The experiments across seven datasets demonstrate that Prompt-Ladder can significantly reduce memory resource usage by at least 27% compared to baselines while maintaining relatively good performance. © 2025 Elsevier Ltd

引用

下载

共 50 条

[21] SwapPrompt: Test-Time Prompt Adaptation for Vision-Language Models
Ma, Xiaosong
Zhang, Jie
Guo, Song
Xu, Wenchao
ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 36 (NEURIPS 2023), 2023,
[22] Constraint embedding for prompt tuning in vision-language pre-trained modelConstraint embedding for prompt tuning in vision-language pre-trained modelK. Cheng et al.
Keyang Cheng
Liutao Wei
Jingfeng Tang
Yongzhao Zhan
Multimedia Systems, 2025, 31 (1)
[23] CoPL: Contextual Prompt Learning for Vision-Language Understanding
Goswami, Koustava
Karanam, Srikrishna
Udhayanan, Prateksha
Joseph, K. J.
Srinivasan, Balaji Vasan
THIRTY-EIGHTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 38 NO 16, 2024, : 18090 - 18098
[24] A Good Prompt Is Worth Millions of Parameters: Low-resource Prompt-based Learning for Vision-Language Models
Jin, Woojeong
Cheng, Yu
Shen, Yelong
Chen, Weizhu
Ren, Xiang
PROCEEDINGS OF THE 60TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2022), VOL 1: (LONG PAPERS), 2022, : 2763 - 2775
[25] Modal interaction-enhanced prompt learning by transformer decoder for vision-language models
Mingyue Liu
Honggang Zhao
Longfei Ma
Mingyong Li
International Journal of Multimedia Information Retrieval, 2023, 12
[26] Pre-training A Prompt Pool for Vision-Language Model
Liu, Jun
Gu, Yang
Yang, Zhaohua
Guo, Shuai
Liu, Huaqiu
Chen, Yiqiang
2023 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS, IJCNN, 2023,
[27] Modal Interaction-Enhanced Prompt Learning by Transformer Decoder for Vision-Language Models
Liu, Mingyue
Zhao, Honggang
Ma, Longfei
Li, Xiang
Ji, Yucheng
Li, Mingyong
KNOWLEDGE SCIENCE, ENGINEERING AND MANAGEMENT, PT IV, KSEM 2023, 2023, 14120 : 163 - 174
[28] Fine-Grained Visual Prompt Learning of Vision-Language Models for Image Recognition
Sun, Hongbo
He, Xiangteng
Zhou, Jiahuan
Peng, Yuxin
PROCEEDINGS OF THE 31ST ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2023, 2023, : 5828 - 5836
[29] Modal interaction-enhanced prompt learning by transformer decoder for vision-language models
Liu, Mingyue
Zhao, Honggang
Ma, Longfei
Li, Mingyong
INTERNATIONAL JOURNAL OF MULTIMEDIA INFORMATION RETRIEVAL, 2023, 12 (02)
[30] Gradient-Regulated Meta-Prompt Learning for Generalizable Vision-Language Models
Li, Juncheng
Gao, Minghe
Wei, Longhui
Tang, Siliang
Zhang, Wenqiao
Li, Mengze
Ji, Wei
Tian, Qi
Chua, Tat-Seng
Zhuang, Yueting
2023 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION, ICCV, 2023, : 2551 - 2562

← 1 2 3 4 5 →