SDPT: Synchronous Dual Prompt Tuning for Fusion-Based Visual-Language Pre-trained Models

被引:0
|
作者
Zhou, Yang [1 ]
Wu, Yongjian [1 ]
Saiyin, Jiya [1 ]
Wei, Bingzheng [2 ]
Lai, Maode [3 ]
Chang, Eric [4 ]
Xu, Yan [1 ]
机构
[1] Beihang Univ, Sch Biol Sci & Med Engn, Beijing, Peoples R China
[2] ByteDance Inc, Beijing, Peoples R China
[3] Zhejiang Univ, Hangzhou, Peoples R China
[4] Taiwan Artificial Intelligence Fdn, Taipei, Taiwan
来源
关键词
Prompt tuning; Parameter-efficient fine-tuning; Visual-language pre-trained models;
D O I
10.1007/978-3-031-72967-6_19
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Prompt tuning methods have achieved remarkable success in parameter-efficient fine-tuning on large pre-trained models. However, their application to dual-modal fusion-based visual-language pre-trained models (VLPMs), such as GLIP, has encountered issues. Existing prompt tuning methods have not effectively addressed the modal mapping and aligning problem for tokens in different modalities, leading to poor transfer generalization. To address this issue, we propose Synchronous Dual Prompt Tuning (SDPT). SDPT initializes a single set of learnable unified prototype tokens in the established modal aligning space to represent the aligned semantics of text and image modalities for downstream tasks. Furthermore, SDPT establishes inverse linear projections that require no training to embed the information of unified prototype tokens into the input space of different modalities. The inverse linear projections allow the unified prototype token to synchronously represent the two modalities and enable SDPT to share the unified semantics of text and image for downstream tasks across different modal prompts. Experimental results demonstrate that SDPT assists fusion-based VLPMs to achieve superior outcomes with only 0.04% of model parameters for training across various scenarios, outperforming other single- or dual-modal methods. The code will be released at https://github.com/wuyongjianCODE/SDPT.
引用
收藏
页码:340 / 356
页数:17
相关论文
共 50 条
  • [31] Pre-trained transformer-based language models for Sundanese
    Wilson Wongso
    Henry Lucky
    Derwin Suhartono
    Journal of Big Data, 9
  • [32] Pre-trained transformer-based language models for Sundanese
    Wongso, Wilson
    Lucky, Henry
    Suhartono, Derwin
    JOURNAL OF BIG DATA, 2022, 9 (01)
  • [33] TCP:Textual-based Class-aware Prompt tuning for Visual-Language Model
    Yao, Hantao
    Zhang, Rui
    Xu, Changsheng
    2024 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2024, : 23438 - 23448
  • [34] Self-supervised Bidirectional Prompt Tuning for Entity-enhanced Pre-trained Language Model
    Zou, Jiaxin
    Xu, Xianghong
    Hou, Jiawei
    Yang, Qiang
    Zheng, Hai-Tao
    2023 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS, IJCNN, 2023,
  • [35] Fusion of Root and Affix Information with Pre-trained Language Models for Text Classification
    Wu, Yujia
    Zhang, Xuan
    Xiao, Guohua
    Ren, Hong
    ADVANCED INTELLIGENT COMPUTING TECHNOLOGY AND APPLICATIONS, PT III, ICIC 2024, 2024, 14877 : 488 - 498
  • [36] The Biases of Pre-Trained Language Models: An Empirical Study on Prompt-Based Sentiment Analysis and Emotion Detection
    Mao, Rui
    Liu, Qian
    He, Kai
    Li, Wei
    Cambria, Erik
    IEEE TRANSACTIONS ON AFFECTIVE COMPUTING, 2023, 14 (03) : 1743 - 1753
  • [37] Pathologies of Pre-trained Language Models in Few-shot Fine-tuning
    Chen, Hanjie
    Zheng, Guoqing
    Awadallah, Ahmed Hassan
    Ji, Yangfeng
    PROCEEDINGS OF THE THIRD WORKSHOP ON INSIGHTS FROM NEGATIVE RESULTS IN NLP (INSIGHTS 2022), 2022, : 144 - 153
  • [38] Revisiting k-NN for Fine-Tuning Pre-trained Language Models
    Li, Lei
    Chen, Jing
    Tian, Botzhong
    Zhang, Ningyu
    CHINESE COMPUTATIONAL LINGUISTICS, CCL 2023, 2023, 14232 : 327 - 338
  • [39] Fine-Tuning Pre-Trained Language Models Effectively by Optimizing Subnetworks Adaptively
    Zhang, Haojie
    Li, Ge
    Li, Jia
    Zhang, Zhongjin
    Zhu, Yuqi
    Jin, Zhi
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 35, NEURIPS 2022, 2022,
  • [40] Talent Supply and Demand Matching Based on Prompt Learning and the Pre-Trained Language Model
    Li, Kunping
    Liu, Jianhua
    Zhuang, Cunbo
    APPLIED SCIENCES-BASEL, 2025, 15 (05):