Prompt-Ladder: Memory-efficient prompt tuning for vision-language models on edge devices

被引：0

作者：

Cai, Siqi ^{[1
]}

Liu, Xuan ^{[2
]}

Yuan, Jingling ^{[1
]}

Zhou, Qihua ^{[3
]}

机构：

[1] School of Computer Science and Artificial Intelligence, Wuhan University of Technology, Wuhan, China

[2] Department of Electronic and Information Engineering, Hong Kong Polytechnic University, Hong Kong

[3] School of Computer Science and Software Engineering, Shenzhen University, Shenzhen, China

来源：

Pattern Recognition | 2025年 / 163卷

基金：

中国国家自然科学基金;

关键词：

Ladders - Semantics - Transfer learning - Visual languages;

D O I：

10.1016/j.patcog.2025.111460

中图分类号：

学科分类号：

摘要：

The pre-trained vision-language models (VLMs) have been the foundation for diverse intelligent services in human life. Common VLMs hold large parameter scales and require heavy memory overhead for model pre-training, which poses challenges in adapting them to edge devices. To enable memory-efficient VLMs, previous works mainly focus on the prompt engineering technique that utilizes trainable soft prompts instead of manually designing hard prompts. However, to update fewer than 3% of prompt parameters, these studies still require the back-propagation chain to traverse pre-trained models with extensive parameters. Consequently, the intermediate activation variables and gradients occupy a significant amount of memory resources, greatly hindering their adaptation on resource-constrained edge devices. In view of the above, we propose a memory-efficient prompt-tuning method, named Prompt-Ladder. Our main idea is to adopt a lightweight ladder network as an agent to bypass VLMs during back-propagation for the parameter optimization of the designed multi-model prompt module. The ladder network fuses the intermediate output of VLMs as a guide and selects important parameters of VLMs to initialize for the maintenance of model performance. We also share parameters of the ladder network between text and image data to obtain a more semantically aligned representation across modalities for the optimization of the prompt module. The experiments across seven datasets demonstrate that Prompt-Ladder can significantly reduce memory resource usage by at least 27% compared to baselines while maintaining relatively good performance. © 2025 Elsevier Ltd

引用

下载

共 50 条

[41] Prompt injection attacks on vision language models in oncology
Jan Clusmann
Dyke Ferber
Isabella C. Wiest
Carolin V. Schneider
Titus J. Brinker
Sebastian Foersch
Daniel Truhn
Jakob Nikolas Kather
Nature Communications, 16 (1)
[42] Tuning Vision-Language Models With Multiple Prototypes Clustering
Guo, Meng-Hao
Zhang, Yi
Mu, Tai-Jiang
Huang, Sharon X.
Hu, Shi-Min
IEEE Transactions on Pattern Analysis and Machine Intelligence, 2024, 46 (12) : 11186 - 11199
[43] Prompt Tuning for Discriminative Pre-trained Language Models
Yao, Yuan
Dong, Bowen
Zhang, Ao
Zhang, Zhengyan
Xie, Ruobing
Liu, Zhiyuan
Lin, Leyu
Sun, Maosong
Wang, Jianyong
FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2022), 2022, : 3468 - 3473
[44] Read-only Prompt Optimization for Vision-Language Few-shot Learning
Lee, Dongjun
Song, Seokwon
Suh, Jihee
Choi, Joonmyeong
Lee, Sanghyeok
Kim, Hyunwoo J.
2023 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION, ICCV, 2023, : 1401 - 1411
[45] MCPL: Multi-modal Collaborative Prompt Learning for Medical Vision-Language Model
Wang P.
Zhang H.
Yuan Y.
IEEE Transactions on Medical Imaging, 2024, 43 (12) : 1 - 1
[46] Prompt tuning discriminative language models for hierarchical text classification
du Toit, Jaco
Dunaiski, Marcel
NATURAL LANGUAGE PROCESSING, 2024,
[47] GraphAdapter: Tuning Vision-Language Models With Dual Knowledge Graph
Li, Xin
Lian, Dongze
Lu, Zhihe
Bai, Jiawang
Chen, Zhibo
Wang, Xinchao
ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 36 (NEURIPS 2023), 2023,
[48] Soft prompt tuning for augmenting dense retrieval with large language models
Peng, Zhiyuan
Wu, Xuyang
Wang, Qifan
Fang, Yi
Knowledge-Based Systems, 2025, 309
[49] Robust Fine-Tuning of Vision-Language Models for Domain Generalization
Vogt-Lowell, Kevin
Lee, Noah
Tsiligkaridis, Theodoros
Vaillant, Marc
2023 IEEE HIGH PERFORMANCE EXTREME COMPUTING CONFERENCE, HPEC, 2023,
[50] Supporting vision-language model few-shot inference with confounder-pruned knowledge prompt
Li, Jiangmeng
Mo, Wenyi
Song, Fei
Sun, Chuxiong
Qiang, Wenwen
Su, Bing
Zheng, Changwen
Neural Networks, 2025, 185

← 1 2 3 4 5 →