Multimodal Emotion Recognition With Transformer-Based Self Supervised Feature Fusion

被引:61
|
作者
Siriwardhana, Shamane [1 ]
Kaluarachchi, Tharindu [1 ]
Billinghurst, Mark [2 ]
Nanayakkara, Suranga [1 ]
机构
[1] Univ Auckland, Auckland Bioengn Inst, Augmented Human Lab, Auckland 1010, New Zealand
[2] Univ Auckland, Auckland Bioengn Inst, Empath Comp Lab, Auckland 1010, New Zealand
来源
IEEE ACCESS | 2020年 / 8卷
关键词
Feature extraction; Emotion recognition; Task analysis; Computational modeling; Bit error rate; Data models; Computer architecture; Multimodal emotion recognition; self-supervised learning; self-attention; transformer; BERT;
D O I
10.1109/ACCESS.2020.3026823
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Emotion Recognition is a challenging research area given its complex nature, and humans express emotional cues across various modalities such as language, facial expressions, and speech. Representation and fusion of features are the most crucial tasks in multimodal emotion recognition research. Self Supervised Learning (SSL) has become a prominent and influential research direction in representation learning, where researchers have access to pre-trained SSL models that represent different data modalities. For the first time in the literature, we represent three input modalities of text, audio (speech), and vision with features extracted from independently pre-trained SSL models in this paper. Given the high dimensional nature of SSL features, we introduce a novel Transformers and Attention-based fusion mechanism that can combine multimodal SSL features and achieve state-of-the-art results for the task of multimodal emotion recognition. We benchmark and evaluate our work to show that our model is robust and outperforms the state-of-the-art models on four datasets.
引用
收藏
页码:176274 / 176285
页数:12
相关论文
共 50 条
  • [1] Transformer-Based Self-Supervised Multimodal Representation Learning for Wearable Emotion Recognition
    Wu, Yujin
    Daoudi, Mohamed
    Amad, Ali
    [J]. IEEE TRANSACTIONS ON AFFECTIVE COMPUTING, 2024, 15 (01) : 157 - 172
  • [2] Transformer-Based Self-Supervised Learning for Emotion Recognition
    Vazquez-Rodriguez, Juan
    Lefebvre, Gregoire
    Cumin, Julien
    Crowley, James L.
    [J]. 2022 26TH INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION (ICPR), 2022, : 2605 - 2612
  • [3] Transformer-based Self-supervised Representation Learning for Emotion Recognition Using Bio-signal Feature Fusion
    Sawant, Shrutika S.
    Erick, F. X.
    Arora, Pulkit
    Pahl, Jaspar
    Foltyn, Andreas
    Holzer, Nina
    Gotz, Theresa
    [J]. 2023 11TH INTERNATIONAL CONFERENCE ON AFFECTIVE COMPUTING AND INTELLIGENT INTERACTION WORKSHOPS AND DEMOS, ACIIW, 2023,
  • [4] Robust Multimodal Emotion Recognition from Conversation with Transformer-Based Crossmodality Fusion
    Xie, Baijun
    Sidulova, Mariia
    Park, Chung Hyuk
    [J]. SENSORS, 2021, 21 (14)
  • [5] A Transformer-Based Model With Self-Distillation for Multimodal Emotion Recognition in Conversations
    Ma, Hui
    Wang, Jian
    Lin, Hongfei
    Zhang, Bo
    Zhang, Yijia
    Xu, Bo
    [J]. IEEE TRANSACTIONS ON MULTIMEDIA, 2024, 26 : 776 - 788
  • [6] TDFNet: Transformer-Based Deep-Scale Fusion Network for Multimodal Emotion Recognition
    Zhao, Zhengdao
    Wang, Yuhua
    Shen, Guang
    Xu, Yuezhu
    Zhang, Jiayuan
    [J]. IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2023, 31 : 3771 - 3782
  • [7] Multimodal Emotion Recognition Based on Feature Fusion
    Xu, Yurui
    Wu, Xiao
    Su, Hang
    Liu, Xiaorui
    [J]. 2022 INTERNATIONAL CONFERENCE ON ADVANCED ROBOTICS AND MECHATRONICS (ICARM 2022), 2022, : 7 - 11
  • [8] Transformer-Based Multilingual Speech Emotion Recognition Using Data Augmentation and Feature Fusion
    Al-onazi, Badriyya B.
    Nauman, Muhammad Asif
    Jahangir, Rashid
    Malik, Muhmmad Mohsin
    Alkhammash, Eman H.
    Elshewey, Ahmed M.
    [J]. APPLIED SCIENCES-BASEL, 2022, 12 (18):
  • [9] Multi-Label Multimodal Emotion Recognition With Transformer-Based Fusion and Emotion-Level Representation Learning
    Le, Hoai-Duy
    Lee, Guee-Sang
    Kim, Soo-Hyung
    Kim, Seungwon
    Yang, Hyung-Jeong
    [J]. IEEE ACCESS, 2023, 11 : 14742 - 14751
  • [10] Transformer-Based Feature Fusion Approach for Multimodal Visual Sentiment Recognition Using Tweets in the Wild
    Alzamzami, Fatimah
    Saddik, Abdulmotaleb El
    [J]. IEEE ACCESS, 2023, 11 : 47070 - 47079