Efficient Multimodal Fusion via Interactive Prompting

被引:12
|
作者
Li, Yaowei [1 ]
Quan, Ruijie [2 ]
Zhu, Linchao [2 ]
Yang, Yi [2 ]
机构
[1] Univ Technol Sydney, ReLER, AAII, Sydney, NSW, Australia
[2] Zhejiang Univ, CCAI, Hangzhou, Peoples R China
基金
澳大利亚研究理事会;
关键词
D O I
10.1109/CVPR52729.2023.00256
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Large-scale pre-training has brought unimodal fields such as computer vision and natural language processing to a new era. Following this trend, the size of multimodal learning models constantly increases, leading to an urgent need to reduce the massive computational cost of finetuning these models for downstream tasks. In this paper, we propose an efficient and flexible multimodal fusion method, namely PMF, tailored for fusing unimodally pretrained transformers. Specifically, we first present a modular multimodal fusion framework that exhibits high flexibility and facilitates mutual interactions among different modalities. In addition, we disentangle vanilla prompts into three types in order to learn different optimizing objectives for multimodal learning. It is also worth noting that we propose to add prompt vectors only on the deep layers of the unimodal transformers, thus significantly reducing the training memory usage. Experiment results show that our proposed method achieves comparable performance to several other multimodal finetuning methods with less than 3% trainable parameters and up to 66% saving of training memory usage.
引用
下载
收藏
页码:2604 / 2613
页数:10
相关论文
共 50 条
  • [21] MULTIMODAL FUSION VIA A SERIES OF TRANSFERS FOR NOISE REMOVAL
    Son, Chang-Hwan
    Zhang, Xiao-Ping
    2017 24TH IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING (ICIP), 2017, : 530 - 534
  • [22] Multimodal Industrial Anomaly Detection via Hybrid Fusion
    Wang, Yue
    Peng, Jinlong
    Zhang, Jiangning
    Yi, Ran
    Wang, Yabiao
    Wang, Chengjie
    2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2023, : 8032 - 8041
  • [23] Object Detection via Multimodal Adaptive Feature Fusion
    Gao Xiaoqiang
    Chang Kan
    Ling Mingyang
    Yin Mengyu
    LASER & OPTOELECTRONICS PROGRESS, 2023, 60 (24)
  • [24] MULTIMODAL IMAGE RETRIEVAL VIA BAYESIAN INFORMATION FUSION
    Zhang, Rui
    Guan, Ling
    ICME: 2009 IEEE INTERNATIONAL CONFERENCE ON MULTIMEDIA AND EXPO, VOLS 1-3, 2009, : 830 - 833
  • [25] Multimodal image fusion via coupled feature learning
    Veshki, Farshad G.
    Ouzir, Nora
    Vorobyov, Sergiy A.
    Ollila, Esa
    SIGNAL PROCESSING, 2022, 200
  • [26] Design and Research of Multimodal Fusion Feedback Device Based on Virtual Interactive System
    Zhang, Zhen
    Shi, Kenan
    Ge, Pan
    Zhang, Taisheng
    Xu, Manman
    Chen, Yu
    ACTUATORS, 2023, 12 (08)
  • [27] An Efficient Fusion Mechanism for Multimodal Low-resource Setting
    Chauhan, Dushyant Singh
    Ekbal, Asif
    Bhattacharyya, Pushpak
    PROCEEDINGS OF THE 45TH INTERNATIONAL ACM SIGIR CONFERENCE ON RESEARCH AND DEVELOPMENT IN INFORMATION RETRIEVAL (SIGIR '22), 2022, : 2583 - 2588
  • [28] LEVERAGING EFFICIENT TRAINING AND FEATURE FUSION IN TRANSFORMERS FOR MULTIMODAL CLASSIFICATION
    Emir, Kenan A. K.
    Lee, Gwang-Gook
    Xu, Yan
    Shen, Mingwei
    2023 IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING, ICIP, 2023, : 1420 - 1424
  • [29] Efficient Multimodal Fusion for Hand Pose Estimation With Hourglass Network
    Hoang, Dinh-Cuong
    Xuan Tan, Phan
    Pham, Duc-Long
    Pham, Hai-Nam
    Bui, Son-Anh
    Nguyen, Chi-Minh
    Phi, An-Binh
    Tran, Khanh-Duong
    Trinh, Viet-Anh
    Tran, van-Duc
    Tran, Duc-Thanh
    Duong, van-Hiep
    Phan, Khanh-Toan
    Nguyen, van-Thiep
    Vu, van-Duc
    Nguyen, Thu-Uyen
    IEEE ACCESS, 2024, 12 : 113810 - 113825
  • [30] Multimodal Image Fusion via Self-Supervised Transformer
    Zhang, Jing
    Liu, Yu
    Liu, Aiping
    Xie, Qingguo
    Ward, Rabab
    Wang, Z. Jane
    Chen, Xun
    IEEE SENSORS JOURNAL, 2023, 23 (09) : 9796 - 9807