Prompt-Ladder: Memory-efficient prompt tuning for vision-language models on edge devices

被引:0
|
作者
Cai, Siqi [1 ]
Liu, Xuan [2 ]
Yuan, Jingling [1 ]
Zhou, Qihua [3 ]
机构
[1] School of Computer Science and Artificial Intelligence, Wuhan University of Technology, Wuhan, China
[2] Department of Electronic and Information Engineering, Hong Kong Polytechnic University, Hong Kong
[3] School of Computer Science and Software Engineering, Shenzhen University, Shenzhen, China
基金
中国国家自然科学基金;
关键词
Ladders - Semantics - Transfer learning - Visual languages;
D O I
10.1016/j.patcog.2025.111460
中图分类号
学科分类号
摘要
The pre-trained vision-language models (VLMs) have been the foundation for diverse intelligent services in human life. Common VLMs hold large parameter scales and require heavy memory overhead for model pre-training, which poses challenges in adapting them to edge devices. To enable memory-efficient VLMs, previous works mainly focus on the prompt engineering technique that utilizes trainable soft prompts instead of manually designing hard prompts. However, to update fewer than 3% of prompt parameters, these studies still require the back-propagation chain to traverse pre-trained models with extensive parameters. Consequently, the intermediate activation variables and gradients occupy a significant amount of memory resources, greatly hindering their adaptation on resource-constrained edge devices. In view of the above, we propose a memory-efficient prompt-tuning method, named Prompt-Ladder. Our main idea is to adopt a lightweight ladder network as an agent to bypass VLMs during back-propagation for the parameter optimization of the designed multi-model prompt module. The ladder network fuses the intermediate output of VLMs as a guide and selects important parameters of VLMs to initialize for the maintenance of model performance. We also share parameters of the ladder network between text and image data to obtain a more semantically aligned representation across modalities for the optimization of the prompt module. The experiments across seven datasets demonstrate that Prompt-Ladder can significantly reduce memory resource usage by at least 27% compared to baselines while maintaining relatively good performance. © 2025 Elsevier Ltd
引用
下载
收藏
相关论文
共 50 条
  • [41] Prompt injection attacks on vision language models in oncology
    Jan Clusmann
    Dyke Ferber
    Isabella C. Wiest
    Carolin V. Schneider
    Titus J. Brinker
    Sebastian Foersch
    Daniel Truhn
    Jakob Nikolas Kather
    Nature Communications, 16 (1)
  • [42] Tuning Vision-Language Models With Multiple Prototypes Clustering
    Guo, Meng-Hao
    Zhang, Yi
    Mu, Tai-Jiang
    Huang, Sharon X.
    Hu, Shi-Min
    IEEE Transactions on Pattern Analysis and Machine Intelligence, 2024, 46 (12) : 11186 - 11199
  • [43] Prompt Tuning for Discriminative Pre-trained Language Models
    Yao, Yuan
    Dong, Bowen
    Zhang, Ao
    Zhang, Zhengyan
    Xie, Ruobing
    Liu, Zhiyuan
    Lin, Leyu
    Sun, Maosong
    Wang, Jianyong
    FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2022), 2022, : 3468 - 3473
  • [44] Read-only Prompt Optimization for Vision-Language Few-shot Learning
    Lee, Dongjun
    Song, Seokwon
    Suh, Jihee
    Choi, Joonmyeong
    Lee, Sanghyeok
    Kim, Hyunwoo J.
    2023 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION, ICCV, 2023, : 1401 - 1411
  • [45] MCPL: Multi-modal Collaborative Prompt Learning for Medical Vision-Language Model
    Wang P.
    Zhang H.
    Yuan Y.
    IEEE Transactions on Medical Imaging, 2024, 43 (12) : 1 - 1
  • [46] Prompt tuning discriminative language models for hierarchical text classification
    du Toit, Jaco
    Dunaiski, Marcel
    NATURAL LANGUAGE PROCESSING, 2024,
  • [47] GraphAdapter: Tuning Vision-Language Models With Dual Knowledge Graph
    Li, Xin
    Lian, Dongze
    Lu, Zhihe
    Bai, Jiawang
    Chen, Zhibo
    Wang, Xinchao
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 36 (NEURIPS 2023), 2023,
  • [48] Soft prompt tuning for augmenting dense retrieval with large language models
    Peng, Zhiyuan
    Wu, Xuyang
    Wang, Qifan
    Fang, Yi
    Knowledge-Based Systems, 2025, 309
  • [49] Robust Fine-Tuning of Vision-Language Models for Domain Generalization
    Vogt-Lowell, Kevin
    Lee, Noah
    Tsiligkaridis, Theodoros
    Vaillant, Marc
    2023 IEEE HIGH PERFORMANCE EXTREME COMPUTING CONFERENCE, HPEC, 2023,
  • [50] Supporting vision-language model few-shot inference with confounder-pruned knowledge prompt
    Li, Jiangmeng
    Mo, Wenyi
    Song, Fei
    Sun, Chuxiong
    Qiang, Wenwen
    Su, Bing
    Zheng, Changwen
    Neural Networks, 2025, 185