Robust Multimodal Emotion Recognition from Conversation with Transformer-Based Crossmodality Fusion

被引:37
|
作者
Xie, Baijun [1 ]
Sidulova, Mariia [1 ]
Park, Chung Hyuk [1 ]
机构
[1] George Washington Univ, Sch Engn & Appl Sci, Dept Biomed Engn, Washington, DC 20052 USA
基金
美国国家科学基金会;
关键词
multimodal emotion recognition; multimodal fusion; crossmodal transformer; attention mechanism; FACIAL EXPRESSION; SPEECH;
D O I
10.3390/s21144913
中图分类号
O65 [分析化学];
学科分类号
070302 ; 081704 ;
摘要
Decades of scientific research have been conducted on developing and evaluating methods for automated emotion recognition. With exponentially growing technology, there is a wide range of emerging applications that require emotional state recognition of the user. This paper investigates a robust approach for multimodal emotion recognition during a conversation. Three separate models for audio, video and text modalities are structured and fine-tuned on the MELD. In this paper, a transformer-based crossmodality fusion with the EmbraceNet architecture is employed to estimate the emotion. The proposed multimodal network architecture can achieve up to 65% accuracy, which significantly surpasses any of the unimodal models. We provide multiple evaluation techniques applied to our work to show that our model is robust and can even outperform the state-of-the-art models on the MELD.
引用
收藏
页数:17
相关论文
共 50 条
  • [1] Multimodal Emotion Recognition With Transformer-Based Self Supervised Feature Fusion
    Siriwardhana, Shamane
    Kaluarachchi, Tharindu
    Billinghurst, Mark
    Nanayakkara, Suranga
    [J]. IEEE ACCESS, 2020, 8 : 176274 - 176285
  • [2] TDFNet: Transformer-Based Deep-Scale Fusion Network for Multimodal Emotion Recognition
    Zhao, Zhengdao
    Wang, Yuhua
    Shen, Guang
    Xu, Yuezhu
    Zhang, Jiayuan
    [J]. IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2023, 31 : 3771 - 3782
  • [3] Improving multimodal fusion with Main Modal Transformer for emotion recognition in conversation
    Zou, ShiHao
    Huang, Xianying
    Shen, XuDong
    Liu, Hankai
    [J]. KNOWLEDGE-BASED SYSTEMS, 2022, 258
  • [4] Multi-Label Multimodal Emotion Recognition With Transformer-Based Fusion and Emotion-Level Representation Learning
    Le, Hoai-Duy
    Lee, Guee-Sang
    Kim, Soo-Hyung
    Kim, Seungwon
    Yang, Hyung-Jeong
    [J]. IEEE ACCESS, 2023, 11 : 14742 - 14751
  • [5] MULTIMODAL TRANSFORMER FUSION FOR CONTINUOUS EMOTION RECOGNITION
    Huang, Jian
    Tao, Jianhua
    Liu, Bin
    Lian, Zheng
    Niu, Mingyue
    [J]. 2020 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2020, : 3507 - 3511
  • [6] A Transformer-Based Model With Self-Distillation for Multimodal Emotion Recognition in Conversations
    Ma, Hui
    Wang, Jian
    Lin, Hongfei
    Zhang, Bo
    Zhang, Yijia
    Xu, Bo
    [J]. IEEE TRANSACTIONS ON MULTIMEDIA, 2024, 26 : 776 - 788
  • [7] Multimodal transformer augmented fusion for speech emotion recognition
    Wang, Yuanyuan
    Gu, Yu
    Yin, Yifei
    Han, Yingping
    Zhang, He
    Wang, Shuang
    Li, Chenyu
    Quan, Dou
    [J]. FRONTIERS IN NEUROROBOTICS, 2023, 17
  • [8] Transformer-Based Self-Supervised Multimodal Representation Learning for Wearable Emotion Recognition
    Wu, Yujin
    Daoudi, Mohamed
    Amad, Ali
    [J]. IEEE TRANSACTIONS ON AFFECTIVE COMPUTING, 2024, 15 (01) : 157 - 172
  • [9] GraphMFT: A graph network based multimodal fusion technique for emotion recognition in conversation
    Li, Jiang
    Wang, Xiaoping
    Lv, Guoqing
    Zeng, Zhigang
    [J]. NEUROCOMPUTING, 2023, 550
  • [10] Multimodal Emotion Recognition in Conversation Based on Hypergraphs
    Li, Jiaze
    Mei, Hongyan
    Jia, Liyun
    Zhang, Xing
    [J]. ELECTRONICS, 2023, 12 (22)