SDPT: Synchronous Dual Prompt Tuning for Fusion-Based Visual-Language Pre-trained Models

被引:0
|
作者
Zhou, Yang [1 ]
Wu, Yongjian [1 ]
Saiyin, Jiya [1 ]
Wei, Bingzheng [2 ]
Lai, Maode [3 ]
Chang, Eric [4 ]
Xu, Yan [1 ]
机构
[1] Beihang Univ, Sch Biol Sci & Med Engn, Beijing, Peoples R China
[2] ByteDance Inc, Beijing, Peoples R China
[3] Zhejiang Univ, Hangzhou, Peoples R China
[4] Taiwan Artificial Intelligence Fdn, Taipei, Taiwan
来源
关键词
Prompt tuning; Parameter-efficient fine-tuning; Visual-language pre-trained models;
D O I
10.1007/978-3-031-72967-6_19
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Prompt tuning methods have achieved remarkable success in parameter-efficient fine-tuning on large pre-trained models. However, their application to dual-modal fusion-based visual-language pre-trained models (VLPMs), such as GLIP, has encountered issues. Existing prompt tuning methods have not effectively addressed the modal mapping and aligning problem for tokens in different modalities, leading to poor transfer generalization. To address this issue, we propose Synchronous Dual Prompt Tuning (SDPT). SDPT initializes a single set of learnable unified prototype tokens in the established modal aligning space to represent the aligned semantics of text and image modalities for downstream tasks. Furthermore, SDPT establishes inverse linear projections that require no training to embed the information of unified prototype tokens into the input space of different modalities. The inverse linear projections allow the unified prototype token to synchronously represent the two modalities and enable SDPT to share the unified semantics of text and image for downstream tasks across different modal prompts. Experimental results demonstrate that SDPT assists fusion-based VLPMs to achieve superior outcomes with only 0.04% of model parameters for training across various scenarios, outperforming other single- or dual-modal methods. The code will be released at https://github.com/wuyongjianCODE/SDPT.
引用
收藏
页码:340 / 356
页数:17
相关论文
共 50 条
  • [1] Prompt Tuning for Discriminative Pre-trained Language Models
    Yao, Yuan
    Dong, Bowen
    Zhang, Ao
    Zhang, Zhengyan
    Xie, Ruobing
    Liu, Zhiyuan
    Lin, Leyu
    Sun, Maosong
    Wang, Jianyong
    FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2022), 2022, : 3468 - 3473
  • [2] Dual Modality Prompt Tuning for Vision-Language Pre-Trained Model
    Xing, Yinghui
    Wu, Qirui
    Cheng, De
    Zhang, Shizhou
    Liang, Guoqiang
    Wang, Peng
    Zhang, Yanning
    IEEE TRANSACTIONS ON MULTIMEDIA, 2024, 26 : 2056 - 2068
  • [3] APrompt: Attention Prompt Tuning for Efficient Adaptation of Pre-trained Language Models
    Wang, Qifan
    Mao, Yuning
    Wang, Jingang
    Yu, Hanchao
    Li, Shaoliang
    Wang, Sinong
    Feng, Fuli
    Huang, Lifu
    Quan, Xiaojun
    Xu, Zenglin
    Liu, Dongfang
    2023 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING (EMNLP 2023), 2023, : 9147 - 9160
  • [4] CPT: Colorful Prompt Tuning for pre-trained vision-language models
    Yao, Yuan
    Zhang, Ao
    Zhang, Zhengyan
    Liu, Zhiyuan
    Chua, Tat-Seng
    Sun, Maosong
    AI OPEN, 2024, 5 : 30 - 38
  • [5] Zero-Shot Nuclei Detection via Visual-Language Pre-trained Models
    Wu, Yongjian
    Zhou, Yang
    Saiyin, Jiya
    Wei, Bingzheng
    lai, Maode
    Shou, Jianzhong
    Fan, Yubo
    Xu, Yan
    MEDICAL IMAGE COMPUTING AND COMPUTER ASSISTED INTERVENTION, MICCAI 2023, PT VI, 2023, 14225 : 693 - 703
  • [6] DVPT: Dynamic Visual Prompt Tuning of large pre-trained models for medical image analysis
    He, Along
    Wu, Yanlin
    Wang, Zhihong
    Li, Tao
    Fu, Huazhu
    NEURAL NETWORKS, 2025, 185
  • [7] Constraint embedding for prompt tuning in vision-language pre-trained model
    Cheng, Keyang
    Wei, Liutao
    Tang, Jingfeng
    Zhan, Yongzhao
    MULTIMEDIA SYSTEMS, 2025, 31 (01)
  • [8] Relational Prompt-Based Pre-Trained Language Models for Social Event Detection
    Li, Pu
    Yu, Xiaoyan
    Peng, Hao
    Xian, Yantuan
    Wang, Linqin
    Sun, Li
    Zhang, Jingyun
    Yu, Philip S.
    ACM Transactions on Information Systems, 2024, 43 (01)
  • [9] KG-prompt: Interpretable knowledge graph prompt for pre-trained language models
    Chen, Liyi
    Liu, Jie
    Duan, Yutai
    Wang, Runze
    KNOWLEDGE-BASED SYSTEMS, 2025, 311
  • [10] Constraint embedding for prompt tuning in vision-language pre-trained modelConstraint embedding for prompt tuning in vision-language pre-trained modelK. Cheng et al.
    Keyang Cheng
    Liutao Wei
    Jingfeng Tang
    Yongzhao Zhan
    Multimedia Systems, 2025, 31 (1)