SDPT: Synchronous Dual Prompt Tuning for Fusion-Based Visual-Language Pre-trained Models

被引:0
|
作者
Zhou, Yang [1 ]
Wu, Yongjian [1 ]
Saiyin, Jiya [1 ]
Wei, Bingzheng [2 ]
Lai, Maode [3 ]
Chang, Eric [4 ]
Xu, Yan [1 ]
机构
[1] Beihang Univ, Sch Biol Sci & Med Engn, Beijing, Peoples R China
[2] ByteDance Inc, Beijing, Peoples R China
[3] Zhejiang Univ, Hangzhou, Peoples R China
[4] Taiwan Artificial Intelligence Fdn, Taipei, Taiwan
来源
关键词
Prompt tuning; Parameter-efficient fine-tuning; Visual-language pre-trained models;
D O I
10.1007/978-3-031-72967-6_19
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Prompt tuning methods have achieved remarkable success in parameter-efficient fine-tuning on large pre-trained models. However, their application to dual-modal fusion-based visual-language pre-trained models (VLPMs), such as GLIP, has encountered issues. Existing prompt tuning methods have not effectively addressed the modal mapping and aligning problem for tokens in different modalities, leading to poor transfer generalization. To address this issue, we propose Synchronous Dual Prompt Tuning (SDPT). SDPT initializes a single set of learnable unified prototype tokens in the established modal aligning space to represent the aligned semantics of text and image modalities for downstream tasks. Furthermore, SDPT establishes inverse linear projections that require no training to embed the information of unified prototype tokens into the input space of different modalities. The inverse linear projections allow the unified prototype token to synchronously represent the two modalities and enable SDPT to share the unified semantics of text and image for downstream tasks across different modal prompts. Experimental results demonstrate that SDPT assists fusion-based VLPMs to achieve superior outcomes with only 0.04% of model parameters for training across various scenarios, outperforming other single- or dual-modal methods. The code will be released at https://github.com/wuyongjianCODE/SDPT.
引用
收藏
页码:340 / 356
页数:17
相关论文
共 50 条
  • [21] Co2PT: Mitigating Bias in Pre-trained Language Models through Counterfactual Contrastive Prompt Tuning
    Dong, Xiangjue
    Zhu, Ziwei
    Wang, Zhuoer
    Teleki, Maria
    Caverlee, James
    FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS - EMNLP 2023, 2023, : 5859 - 5871
  • [22] Prompt Learning with Structured Semantic Knowledge Makes Pre-Trained Language Models Better
    Zheng, Hai-Tao
    Xie, Zuotong
    Liu, Wenqiang
    Huang, Dongxiao
    Wu, Bei
    Kim, Hong-Gee
    ELECTRONICS, 2023, 12 (15)
  • [23] Adaptive Prompt Routing for Arbitrary Text Style Transfer with Pre-trained Language Models
    Liu, Qingyi
    Qin, Jinghui
    Ye, Wenxuan
    Mou, Hao
    He, Yuxuan
    Wang, Keze
    THIRTY-EIGHTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 38 NO 17, 2024, : 18689 - 18697
  • [24] Debiasing Pre-Trained Language Models via Efficient Fine-Tuning
    Gira, Michael
    Zhang, Ruisu
    Lee, Kangwook
    PROCEEDINGS OF THE SECOND WORKSHOP ON LANGUAGE TECHNOLOGY FOR EQUALITY, DIVERSITY AND INCLUSION (LTEDI 2022), 2022, : 59 - 69
  • [25] Attention Prompt Tuning: Parameter-efficient Adaptation of Pre-trained Models for Action Recognition
    Bandara, Wele Gedara Chaminda
    Patel, Vishal M.
    2024 IEEE 18TH INTERNATIONAL CONFERENCE ON AUTOMATIC FACE AND GESTURE RECOGNITION, FG 2024, 2024,
  • [26] Context-focused Prompt Tuning Pre-trained Code Models to Improve Code Summarization
    Pan, Xinglu
    Liu, Chenxiao
    Zou, Yanzhen
    Zhao, Xianlin
    Xie, Bing
    2024 IEEE 48TH ANNUAL COMPUTERS, SOFTWARE, AND APPLICATIONS CONFERENCE, COMPSAC 2024, 2024, : 1344 - 1349
  • [27] Gender-tuning: Empowering Fine-tuning for Debiasing Pre-trained Language Models
    Ghanbarzadeh, Somayeh
    Huang, Yan
    Palangi, Hamid
    Moreno, Radames Cruz
    Khanpour, Hamed
    FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, ACL 2023, 2023, : 5448 - 5458
  • [28] AttriPrompter: Auto-Prompting With Attribute Semantics for Zero-Shot Nuclei Detection via Visual-Language Pre-Trained Models
    Wu, Yongjian
    Zhou, Yang
    Saiyin, Jiya
    Wei, Bingzheng
    Lai, Maode
    Shou, Jianzhong
    Xu, Yan
    IEEE TRANSACTIONS ON MEDICAL IMAGING, 2025, 44 (02) : 982 - 993
  • [29] A Data Cartography based MixUp for Pre-trained Language Models
    Park, Seo Yeon
    Caragea, Cornelia
    NAACL 2022: THE 2022 CONFERENCE OF THE NORTH AMERICAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS: HUMAN LANGUAGE TECHNOLOGIES, 2022, : 4244 - 4250
  • [30] MuDPT: Multi-modal Deep-symphysis Prompt Tuning for Large Pre-trained Vision-Language Models
    Miao, Yongzhu
    Li, Shasha
    Tang, Jintao
    Wang, Ting
    2023 IEEE INTERNATIONAL CONFERENCE ON MULTIMEDIA AND EXPO, ICME, 2023, : 25 - 30