SDPT: Synchronous Dual Prompt Tuning for Fusion-Based Visual-Language Pre-trained Models

被引:0
|
作者
Zhou, Yang [1 ]
Wu, Yongjian [1 ]
Saiyin, Jiya [1 ]
Wei, Bingzheng [2 ]
Lai, Maode [3 ]
Chang, Eric [4 ]
Xu, Yan [1 ]
机构
[1] Beihang Univ, Sch Biol Sci & Med Engn, Beijing, Peoples R China
[2] ByteDance Inc, Beijing, Peoples R China
[3] Zhejiang Univ, Hangzhou, Peoples R China
[4] Taiwan Artificial Intelligence Fdn, Taipei, Taiwan
来源
关键词
Prompt tuning; Parameter-efficient fine-tuning; Visual-language pre-trained models;
D O I
10.1007/978-3-031-72967-6_19
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Prompt tuning methods have achieved remarkable success in parameter-efficient fine-tuning on large pre-trained models. However, their application to dual-modal fusion-based visual-language pre-trained models (VLPMs), such as GLIP, has encountered issues. Existing prompt tuning methods have not effectively addressed the modal mapping and aligning problem for tokens in different modalities, leading to poor transfer generalization. To address this issue, we propose Synchronous Dual Prompt Tuning (SDPT). SDPT initializes a single set of learnable unified prototype tokens in the established modal aligning space to represent the aligned semantics of text and image modalities for downstream tasks. Furthermore, SDPT establishes inverse linear projections that require no training to embed the information of unified prototype tokens into the input space of different modalities. The inverse linear projections allow the unified prototype token to synchronously represent the two modalities and enable SDPT to share the unified semantics of text and image for downstream tasks across different modal prompts. Experimental results demonstrate that SDPT assists fusion-based VLPMs to achieve superior outcomes with only 0.04% of model parameters for training across various scenarios, outperforming other single- or dual-modal methods. The code will be released at https://github.com/wuyongjianCODE/SDPT.
引用
收藏
页码:340 / 356
页数:17
相关论文
共 50 条
  • [41] DSEE: Dually Sparsity-embedded Efficient Tuning of Pre-trained Language Models
    Chen, Xuxi
    Chen, Tianlong
    Chen, Weizhu
    Awadallah, Ahmed Hassan
    Wang, Zhangyang
    Cheng, Yu
    PROCEEDINGS OF THE 61ST ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, ACL 2023, VOL 1, 2023, : 8208 - 8222
  • [42] An Empirical Study on Hyperparameter Optimization for Fine-Tuning Pre-trained Language Models
    Liu, Xueqing
    Wang, Chi
    59TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS AND THE 11TH INTERNATIONAL JOINT CONFERENCE ON NATURAL LANGUAGE PROCESSING, VOL 1 (ACL-IJCNLP 2021), 2021, : 2286 - 2300
  • [43] An Integrated Chinese Malicious Webpages Detection Method Based on Pre-trained Language Models and Feature Fusion
    Jiang, Yanting
    Wu, Di
    Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), 2022, 13579 LNCS : 155 - 167
  • [44] NtNDet: Hardware Trojan detection based on pre-trained language models
    Kuang, Shijie
    Quan, Zhe
    Xie, Guoqi
    Cai, Xiaomin
    Li, Keqin
    EXPERT SYSTEMS WITH APPLICATIONS, 2025, 271
  • [45] Few-shot medical relation extraction via prompt tuning enhanced pre-trained language model
    He, Guoxiu
    Huang, Chen
    NEUROCOMPUTING, 2025, 633
  • [46] Entity Resolution Based on Pre-trained Language Models with Two Attentions
    Zhu, Liang
    Liu, Hao
    Song, Xin
    Wei, Yonggang
    Wang, Yu
    WEB AND BIG DATA, PT III, APWEB-WAIM 2023, 2024, 14333 : 433 - 448
  • [47] Intelligent Completion of Ancient Texts Based on Pre-trained Language Models
    Li J.
    Ming C.
    Guo Z.
    Qian T.
    Peng Z.
    Wang X.
    Li X.
    Li J.
    Data Analysis and Knowledge Discovery, 2024, 8 (05) : 59 - 67
  • [48] A Brief Review of Relation Extraction Based on Pre-Trained Language Models
    Xu, Tiange
    Zhang, Fu
    FUZZY SYSTEMS AND DATA MINING VI, 2020, 331 : 775 - 789
  • [49] GRAM: Fast Fine-tuning of Pre-trained Language Models for Content-based Collaborative Filtering
    Yang, Yoonseok
    Kim, Kyu Seok
    Kim, Minsam
    Parkt, Juneyoung
    NAACL 2022: THE 2022 CONFERENCE OF THE NORTH AMERICAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS: HUMAN LANGUAGE TECHNOLOGIES, 2022, : 839 - 851
  • [50] Investigating Prompt Learning for Chinese Few-Shot Text Classification with Pre-Trained Language Models
    Song, Chengyu
    Shao, Taihua
    Lin, Kejing
    Liu, Dengfeng
    Wang, Siyuan
    Chen, Honghui
    APPLIED SCIENCES-BASEL, 2022, 12 (21):