Meaningful Multimodal Emotion Recognition Based on Capsule Graph Transformer Architecture

被引:0
|
作者
Filali, Hajar [1 ,2 ]
Boulealam, Chafik [1 ]
El Fazazy, Khalid [1 ]
Mahraz, Adnane Mohamed [1 ]
Tairi, Hamid [1 ]
Riffi, Jamal [1 ]
机构
[1] Sidi Mohamed Ben Abdellah Univ, Fac Sci Dhar El Mahraz, Dept Comp Sci, LISAC, Fes 30000, Morocco
[2] ISGA, Lab Innovat Management & Engn Enterprise LIMITE, Fes 30000, Morocco
关键词
emotion recognition; deep learning; graph convolutional network; capsule network; vision transformer; meaningful neural network (MNN); multimodal architecture;
D O I
10.3390/info16010040
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
The development of emotionally intelligent computers depends on emotion recognition based on richer multimodal inputs, such as text, speech, and visual cues, as multiple modalities complement one another. The effectiveness of complex relationships between modalities for emotion recognition has been demonstrated, but these relationships are still largely unexplored. Various fusion mechanisms using simply concatenated information have been the mainstay of previous research in learning multimodal representations for emotion classification, rather than fully utilizing the benefits of deep learning. In this paper, a unique deep multimodal emotion model is proposed, which uses the meaningful neural network to learn meaningful multimodal representations while classifying data. Specifically, the proposed model concatenates multimodality inputs using a graph convolutional network to extract acoustic modality, a capsule network to generate the textual modality, and vision transformer to acquire the visual modality. Despite the effectiveness of MNN, we have used it as a methodological innovation that will be fed with the previously generated vector parameters to produce better predictive results. Our suggested approach for more accurate multimodal emotion recognition has been shown through extensive examinations, producing state-of-the-art results with accuracies of 69% and 56% on two public datasets, MELD and MOSEI, respectively.
引用
收藏
页数:22
相关论文
共 50 条
  • [21] Topic and Style-aware Transformer for Multimodal Emotion Recognition
    Qiu, Shuwen
    Sekhar, Nitesh
    Singhal, Prateek
    FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, ACL 2023, 2023, : 2074 - 2082
  • [22] Token-disentangling Mutual Transformer for multimodal emotion recognition
    Yin, Guanghao
    Liu, Yuanyuan
    Liu, Tengfei
    Zhang, Haoyu
    Fang, Fang
    Tang, Chang
    Jiang, Liangxiao
    ENGINEERING APPLICATIONS OF ARTIFICIAL INTELLIGENCE, 2024, 133
  • [23] Learning Mutual Correlation in Multimodal Transformer for Speech Emotion Recognition
    Wang, Yuhua
    Shen, Guang
    Xu, Yuezhu
    Li, Jiahang
    Zhao, Zhengdao
    INTERSPEECH 2021, 2021, : 4518 - 4522
  • [24] KEY-SPARSE TRANSFORMER FOR MULTIMODAL SPEECH EMOTION RECOGNITION
    Chen, Weidong
    Xing, Xiaofeng
    Xu, Xiangmin
    Yang, Jichen
    Pang, Jianxin
    2022 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2022, : 6897 - 6901
  • [25] EEG-based Emotion Recognition via Transformer Neural Architecture Search
    Li, Chang
    Zhang, Zhongzhen
    Zhang, Xiaodong
    Huang, Guoning
    Liu, Yu
    Chen, Xun
    IEEE TRANSACTIONS ON INDUSTRIAL INFORMATICS, 2023, 19 (04) : 6016 - 6025
  • [26] A Transformer-Based Model With Self-Distillation for Multimodal Emotion Recognition in Conversations
    Ma, Hui
    Wang, Jian
    Lin, Hongfei
    Zhang, Bo
    Zhang, Yijia
    Xu, Bo
    IEEE TRANSACTIONS ON MULTIMEDIA, 2024, 26 : 776 - 788
  • [27] Robust Multimodal Emotion Recognition from Conversation with Transformer-Based Crossmodality Fusion
    Xie, Baijun
    Sidulova, Mariia
    Park, Chung Hyuk
    SENSORS, 2021, 21 (14)
  • [28] ABE: An Agent-Based Software Architecture for A Multimodal Emotion Recognition Framework
    Gonzalez-Sanchez, Javier
    Chavez-Echeagaray, Maria Elena
    Atkinson, Robert
    Burleson, Winslow
    2011 9TH WORKING IEEE/IFIP CONFERENCE ON SOFTWARE ARCHITECTURE (WICSA), 2011, : 187 - 193
  • [29] Emotion Recognition Based on Multimodal Information
    Zeng, Zhihong
    Pantic, Maja
    Huang, Thomas S.
    AFFECTIVE INFORMATION PROCESSING, 2009, : 241 - +
  • [30] Graph to Grid: Learning Deep Representations for Multimodal Emotion Recognition
    Jin, Ming
    Li, Jinpeng
    PROCEEDINGS OF THE 31ST ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2023, 2023, : 5985 - 5993