Meaningful Multimodal Emotion Recognition Based on Capsule Graph Transformer Architecture

被引:0
|
作者
Filali, Hajar [1 ,2 ]
Boulealam, Chafik [1 ]
El Fazazy, Khalid [1 ]
Mahraz, Adnane Mohamed [1 ]
Tairi, Hamid [1 ]
Riffi, Jamal [1 ]
机构
[1] Sidi Mohamed Ben Abdellah Univ, Fac Sci Dhar El Mahraz, Dept Comp Sci, LISAC, Fes 30000, Morocco
[2] ISGA, Lab Innovat Management & Engn Enterprise LIMITE, Fes 30000, Morocco
关键词
emotion recognition; deep learning; graph convolutional network; capsule network; vision transformer; meaningful neural network (MNN); multimodal architecture;
D O I
10.3390/info16010040
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
The development of emotionally intelligent computers depends on emotion recognition based on richer multimodal inputs, such as text, speech, and visual cues, as multiple modalities complement one another. The effectiveness of complex relationships between modalities for emotion recognition has been demonstrated, but these relationships are still largely unexplored. Various fusion mechanisms using simply concatenated information have been the mainstay of previous research in learning multimodal representations for emotion classification, rather than fully utilizing the benefits of deep learning. In this paper, a unique deep multimodal emotion model is proposed, which uses the meaningful neural network to learn meaningful multimodal representations while classifying data. Specifically, the proposed model concatenates multimodality inputs using a graph convolutional network to extract acoustic modality, a capsule network to generate the textual modality, and vision transformer to acquire the visual modality. Despite the effectiveness of MNN, we have used it as a methodological innovation that will be fed with the previously generated vector parameters to produce better predictive results. Our suggested approach for more accurate multimodal emotion recognition has been shown through extensive examinations, producing state-of-the-art results with accuracies of 69% and 56% on two public datasets, MELD and MOSEI, respectively.
引用
收藏
页数:22
相关论文
共 50 条
  • [41] Bi-Modal Bi-Task Emotion Recognition Based on Transformer Architecture
    Song, Yu
    Zhou, Qi
    APPLIED ARTIFICIAL INTELLIGENCE, 2024, 38 (01)
  • [42] Bridge Graph Attention Based Graph Convolution Network With Multi-Scale Transformer for EEG Emotion Recognition
    Yan, Huachao
    Guo, Kailing
    Xing, Xiaofen
    Xu, Xiangmin
    IEEE TRANSACTIONS ON AFFECTIVE COMPUTING, 2024, 15 (04) : 2042 - 2054
  • [43] EEG-fNIRS-Based Emotion Recognition Using Graph Convolution and Capsule Attention Network
    Chen, Guijun
    Liu, Yue
    Zhang, Xueying
    BRAIN SCIENCES, 2024, 14 (08)
  • [44] Multi-Label Multimodal Emotion Recognition With Transformer-Based Fusion and Emotion-Level Representation Learning
    Le, Hoai-Duy
    Lee, Guee-Sang
    Kim, Soo-Hyung
    Kim, Seungwon
    Yang, Hyung-Jeong
    IEEE ACCESS, 2023, 11 : 14742 - 14751
  • [45] Multimodal Emotion Recognition Based on the Decoupling of Emotion and Speaker Information
    Gajsek, Rok
    Struc, Vitomir
    Mihelic, France
    TEXT, SPEECH AND DIALOGUE, 2010, 6231 : 275 - 282
  • [46] CAPSULE TRANSFORMER NETWORK FOR DYNAMIC HAND GESTURE RECOGNITION USING MULTIMODAL DATA
    Lebas, Alexandre
    Slama, Rim
    Wannous, Hazem
    2023 IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING, ICIP, 2023, : 2130 - 2134
  • [47] A cross-modal fusion network based on graph feature learning for multimodal emotion recognition
    Cao Xiaopeng
    Zhang Linying
    Chen Qiuxian
    Ning Hailong
    Dong Yizhuo
    The Journal of China Universities of Posts and Telecommunications, 2024, 31 (06) : 16 - 25
  • [48] Multimodal graph learning with framelet-based stochastic configuration networks for emotion recognition in conversation
    Shi, Jiandong
    Li, Ming
    Chen, Yuting
    Cui, Lixin
    Bai, Lu
    INFORMATION SCIENCES, 2025, 686
  • [49] Pedestrian Attribute Recognition Based on Multimodal Transformer
    Liu, Dan
    Song, Wei
    Zhao, Xiaobing
    PATTERN RECOGNITION AND COMPUTER VISION, PRCV 2023, PT I, 2024, 14425 : 422 - 433
  • [50] DGSNet: Dual Graph Structure Network for Emotion Recognition in Multimodal Conversations
    Tang, Shimin
    Wang, Changjian
    Tian, Fengyu
    Xu, Kele
    Xu, Minpeng
    2023 IEEE 35TH INTERNATIONAL CONFERENCE ON TOOLS WITH ARTIFICIAL INTELLIGENCE, ICTAI, 2023, : 78 - 85