Meaningful Multimodal Emotion Recognition Based on Capsule Graph Transformer Architecture

被引:0
|
作者
Filali, Hajar [1 ,2 ]
Boulealam, Chafik [1 ]
El Fazazy, Khalid [1 ]
Mahraz, Adnane Mohamed [1 ]
Tairi, Hamid [1 ]
Riffi, Jamal [1 ]
机构
[1] Sidi Mohamed Ben Abdellah Univ, Fac Sci Dhar El Mahraz, Dept Comp Sci, LISAC, Fes 30000, Morocco
[2] ISGA, Lab Innovat Management & Engn Enterprise LIMITE, Fes 30000, Morocco
关键词
emotion recognition; deep learning; graph convolutional network; capsule network; vision transformer; meaningful neural network (MNN); multimodal architecture;
D O I
10.3390/info16010040
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
The development of emotionally intelligent computers depends on emotion recognition based on richer multimodal inputs, such as text, speech, and visual cues, as multiple modalities complement one another. The effectiveness of complex relationships between modalities for emotion recognition has been demonstrated, but these relationships are still largely unexplored. Various fusion mechanisms using simply concatenated information have been the mainstay of previous research in learning multimodal representations for emotion classification, rather than fully utilizing the benefits of deep learning. In this paper, a unique deep multimodal emotion model is proposed, which uses the meaningful neural network to learn meaningful multimodal representations while classifying data. Specifically, the proposed model concatenates multimodality inputs using a graph convolutional network to extract acoustic modality, a capsule network to generate the textual modality, and vision transformer to acquire the visual modality. Despite the effectiveness of MNN, we have used it as a methodological innovation that will be fed with the previously generated vector parameters to produce better predictive results. Our suggested approach for more accurate multimodal emotion recognition has been shown through extensive examinations, producing state-of-the-art results with accuracies of 69% and 56% on two public datasets, MELD and MOSEI, respectively.
引用
收藏
页数:22
相关论文
共 50 条
  • [1] MULTIMODAL EMOTION RECOGNITION WITH CAPSULE GRAPH CONVOLUTIONAL BASED REPRESENTATION FUSION
    Liu, Jiaxing
    Chen, Sen
    Wang, Longbiao
    Liu, Zhilei
    Fu, Yahui
    Guo, Lili
    Dang, Jianwu
    2021 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP 2021), 2021, : 6339 - 6343
  • [2] MULTIMODAL TRANSFORMER FUSION FOR CONTINUOUS EMOTION RECOGNITION
    Huang, Jian
    Tao, Jianhua
    Liu, Bin
    Lian, Zheng
    Niu, Mingyue
    2020 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2020, : 3507 - 3511
  • [3] Multimodal Transformer Fusion for Emotion Recognition: A Survey
    Belaref, Amdjed
    Seguier, Renaud
    2024 6TH INTERNATIONAL CONFERENCE ON NATURAL LANGUAGE PROCESSING, ICNLP 2024, 2024, : 107 - 113
  • [4] Joint Multimodal Transformer for Emotion Recognition in the Wild
    Waligora, Paul
    Aslam, Muhammad Haseeb
    Zeeshan, Muhammad Osama
    Belharbi, Soufiane
    Koerich, Alessandro Lameiras
    Pedersoli, Marco
    Bacon, Simon
    Granger, Eric
    2024 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION WORKSHOPS, CVPRW, 2024, : 4625 - 4635
  • [5] DBT: multimodal emotion recognition based on dual-branch transformer
    Yufan Yi
    Yan Tian
    Cong He
    Yajing Fan
    Xinli Hu
    Yiping Xu
    The Journal of Supercomputing, 2023, 79 : 8611 - 8633
  • [6] DBT: multimodal emotion recognition based on dual-branch transformer
    Yi, Yufan
    Tian, Yan
    He, Cong
    Fan, Yajing
    Hu, Xinli
    Xu, Yiping
    JOURNAL OF SUPERCOMPUTING, 2023, 79 (08): : 8611 - 8633
  • [7] Spatiotemporal Gated Graph Transformer for EEG-Based Emotion Recognition
    Chang, Yadong
    Zheng, Xianwei
    Chen, Yijun
    Li, Xutao
    Miao, Qing
    IEEE SIGNAL PROCESSING LETTERS, 2024, 31 : 1630 - 1634
  • [8] Hierarchical heterogeneous graph network based multimodal emotion recognition in conversation
    Peng, Junyin
    Tang, Hong
    Zheng, Wenbin
    MULTIMEDIA SYSTEMS, 2025, 31 (01)
  • [9] Noise-Resistant Multimodal Transformer for Emotion Recognition
    Liu, Yuanyuan
    Zhang, Haoyu
    Zhan, Yibing
    Chen, Zijing
    Yin, Guanghao
    Wei, Lin
    Chen, Zhe
    INTERNATIONAL JOURNAL OF COMPUTER VISION, 2025, 133 (05) : 3020 - 3040
  • [10] Multimodal transformer augmented fusion for speech emotion recognition
    Wang, Yuanyuan
    Gu, Yu
    Yin, Yifei
    Han, Yingping
    Zhang, He
    Wang, Shuang
    Li, Chenyu
    Quan, Dou
    FRONTIERS IN NEUROROBOTICS, 2023, 17