Meaningful Multimodal Emotion Recognition Based on Capsule Graph Transformer Architecture

被引:0
|
作者
Filali, Hajar [1 ,2 ]
Boulealam, Chafik [1 ]
El Fazazy, Khalid [1 ]
Mahraz, Adnane Mohamed [1 ]
Tairi, Hamid [1 ]
Riffi, Jamal [1 ]
机构
[1] Sidi Mohamed Ben Abdellah Univ, Fac Sci Dhar El Mahraz, Dept Comp Sci, LISAC, Fes 30000, Morocco
[2] ISGA, Lab Innovat Management & Engn Enterprise LIMITE, Fes 30000, Morocco
关键词
emotion recognition; deep learning; graph convolutional network; capsule network; vision transformer; meaningful neural network (MNN); multimodal architecture;
D O I
10.3390/info16010040
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
The development of emotionally intelligent computers depends on emotion recognition based on richer multimodal inputs, such as text, speech, and visual cues, as multiple modalities complement one another. The effectiveness of complex relationships between modalities for emotion recognition has been demonstrated, but these relationships are still largely unexplored. Various fusion mechanisms using simply concatenated information have been the mainstay of previous research in learning multimodal representations for emotion classification, rather than fully utilizing the benefits of deep learning. In this paper, a unique deep multimodal emotion model is proposed, which uses the meaningful neural network to learn meaningful multimodal representations while classifying data. Specifically, the proposed model concatenates multimodality inputs using a graph convolutional network to extract acoustic modality, a capsule network to generate the textual modality, and vision transformer to acquire the visual modality. Despite the effectiveness of MNN, we have used it as a methodological innovation that will be fed with the previously generated vector parameters to produce better predictive results. Our suggested approach for more accurate multimodal emotion recognition has been shown through extensive examinations, producing state-of-the-art results with accuracies of 69% and 56% on two public datasets, MELD and MOSEI, respectively.
引用
收藏
页数:22
相关论文
共 50 条
  • [31] Bilevel Relational Graph Representation Learning-based Multimodal Emotion Recognition in Conversation
    Zhao, Huan
    Ju, Yi
    Gao, Yingxue
    2024 IEEE INTERNATIONAL CONFERENCE ON MULTIMEDIA AND EXPO, ICME 2024, 2024,
  • [32] Multimodal Emotion Recognition Using Compressed Graph Neural Networks
    Durkic, Tijana
    Simic, Nikola
    Bajovic, Sinisa Suzie Dragana
    Peric, Zoran
    Delic, Vladan
    SPEECH AND COMPUTER, SPECOM 2024, PT II, 2025, 15300 : 109 - 121
  • [33] Bi-stream graph learning based multimodal fusion for emotion recognition in conversation
    Lu, Nannan
    Han, Zhiyuan
    Han, Min
    Qian, Jiansheng
    INFORMATION FUSION, 2024, 106
  • [34] FrameERC: Framelet Transform Based Multimodal Graph Neural Networks for Emotion Recognition in Conversation
    Li, Ming
    Shi, Jiandong
    Bai, Lu
    Huang, Changqin
    Jiang, Yunliang
    Lu, Ke
    Wang, Shijin
    Hancock, Edwin R.
    PATTERN RECOGNITION, 2025, 161
  • [35] Transformer-Based Self-Supervised Multimodal Representation Learning for Wearable Emotion Recognition
    Wu, Yujin
    Daoudi, Mohamed
    Amad, Ali
    IEEE TRANSACTIONS ON AFFECTIVE COMPUTING, 2024, 15 (01) : 157 - 172
  • [36] Towards Learning a Joint Representation from Transformer in Multimodal Emotion Recognition
    Deng, James J.
    Leung, Clement H. C.
    BRAIN INFORMATICS, BI 2021, 2021, 12960 : 179 - 188
  • [37] Improving multimodal fusion with Main Modal Transformer for emotion recognition in conversation
    Zou, ShiHao
    Huang, Xianying
    Shen, XuDong
    Liu, Hankai
    KNOWLEDGE-BASED SYSTEMS, 2022, 258
  • [38] EmoCaps: Emotion Capsule based Model for Conversational Emotion Recognition
    Li, Zaijing
    Tang, Fengxiao
    Zhao, Ming
    Zhu, Yusen
    FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2022), 2022, : 1610 - 1618
  • [39] TDFNet: Transformer-Based Deep-Scale Fusion Network for Multimodal Emotion Recognition
    Zhao, Zhengdao
    Wang, Yuhua
    Shen, Guang
    Xu, Yuezhu
    Zhang, Jiayuan
    IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2023, 31 : 3771 - 3782
  • [40] Multimodal Prompt Transformer with Hybrid Contrastive Learning for Emotion Recognition in Conversation
    Zou, Shihao
    Huang, Xianying
    Shen, Xudong
    PROCEEDINGS OF THE 31ST ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2023, 2023, : 5994 - 6003