Learning to Switch off, Switch on, and Integrate Modalities in Large Pre-trained Transformers

被引:0
|
作者
Duseja, Tejas [1 ]
Annervaz, K. M. [1 ]
Duggani, Jeevithiesh [1 ]
Zacharia, Shyam [2 ]
Free, Michael [3 ]
Dukkipati, Ambedkar [1 ]
机构
[1] Indian Inst Sci, Bengaluru, India
[2] British Telcom, Bengaluru, India
[3] British Telcom, London, England
关键词
Multi-modal emotion recognition; sentiment analysis; pre-trained models;
D O I
10.1109/MIPR62202.2024.00070
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Transformer models that revolutionized foundation models are ubiquitous nowadays. Hence, there has been a surge in pre-trained transformers that can be fine-tuned to perform different downstream tasks. Most pre-trained transformers are trained only on a single modality, and there is no direct way to fine-tune them in multiple modalities. To tackle this issue, in this paper, we propose a general-purpose gate, SSIM (Switch off, Switch on, and Integrate Modalities), by which one can integrate other modalities into large pre-trained language transformers. The proposed SSIM gate helps to obtain the unified representation by soft-switching between multi-modal interactions. To evaluate our approach, we have established benchmarks using pre-trained language transformers like BERT, XLNet, and T5 on multi-modal tasks such as Sentiment and Emotion analysis (CMU-MOSI, CMU-MOSEI), Emotion Recognition in Conversations (IEMOCAP, MELD), and Multimodal Intent Recognition (MIntRec), achieving close to State-of-the-art results.
引用
收藏
页码:403 / 409
页数:7
相关论文
共 50 条
  • [1] Are Pre-trained Convolutions Better than Pre-trained Transformers?
    Tay, Yi
    Dehghani, Mostafa
    Gupta, Jai
    Aribandi, Vamsi
    Bahri, Dara
    Qin, Zhen
    Metzler, Donald
    59TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS AND THE 11TH INTERNATIONAL JOINT CONFERENCE ON NATURAL LANGUAGE PROCESSING (ACL-IJCNLP 2021), VOL 1, 2021, : 4349 - 4359
  • [2] Calibration of Pre-trained Transformers
    Desai, Shrey
    Durrett, Greg
    PROCEEDINGS OF THE 2020 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING (EMNLP), 2020, : 295 - 302
  • [3] Emergent Modularity in Pre-trained Transformers
    Zhang, Zhengyan
    Zeng, Zhiyuan
    Lin, Yankai
    Xiao, Chaojun
    Wang, Xiaozhi
    Han, Xu
    Liu, Zhiyuan
    Xie, Ruobing
    Sun, Maosong
    Zhou, Jie
    FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, ACL 2023, 2023, : 4066 - 4083
  • [4] Pre-trained transformers: an empirical comparison
    Casola, Silvia
    Lauriola, Ivano
    Lavelli, Alberto
    MACHINE LEARNING WITH APPLICATIONS, 2022, 9
  • [5] Word Representation Learning in Multimodal Pre-Trained Transformers: An Intrinsic Evaluation
    Pezzelle, Sandro
    Takmaz, Ece
    Fernandez, Raquel
    TRANSACTIONS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, 2021, 9 : 1563 - 1579
  • [6] Face Inpainting with Pre-trained Image Transformers
    Gonc, Kaan
    Saglam, Baturay
    Kozat, Suleyman S.
    Dibeklioglu, Hamdi
    2022 30TH SIGNAL PROCESSING AND COMMUNICATIONS APPLICATIONS CONFERENCE, SIU, 2022,
  • [7] How Different are Pre-trained Transformers for Text Ranking?
    Rau, David
    Kamps, Jaap
    ADVANCES IN INFORMATION RETRIEVAL, PT II, 2022, 13186 : 207 - 214
  • [8] Efficient feature selection for pre-trained vision transformers
    Huang, Lan
    Zeng, Jia
    Yu, Mengqiang
    Ding, Weiping
    Bai, Xingyu
    Wang, Kangping
    COMPUTER VISION AND IMAGE UNDERSTANDING, 2025, 254
  • [9] Predicting Terms in IS-A Relations with Pre-trained Transformers
    Nikishina, Irina
    Chernomorchenko, Polina
    Demidova, Anastasiia
    Panchenko, Alexander
    Biemann, Chris
    13TH INTERNATIONAL JOINT CONFERENCE ON NATURAL LANGUAGE PROCESSING AND THE 3RD CONFERENCE OF THE ASIA-PACIFIC CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, IJCNLP-AACL 2023, 2023, : 134 - 148
  • [10] Generative pre-trained transformers (GPT) for surface engineering
    Kamnis, Spyros
    SURFACE & COATINGS TECHNOLOGY, 2023, 466