MFGCN: Multimodal fusion graph convolutional network for speech emotion recognition

被引:0
|
作者
Qi, Xin [1 ]
Wen, Yujun [1 ]
Zhang, Pengzhou [1 ]
Huang, Heyan [2 ]
机构
[1] State Key Laboratory of Media Convergence and Communication, Communication University of China, Beijing,100024, China
[2] School of Computer Science and Technology, Beijing Institute of Technology, Beijing,100081, China
关键词
Emotion Recognition;
D O I
10.1016/j.neucom.2024.128646
中图分类号
学科分类号
摘要
Speech emotion recognition (SER) is challenging owing to the complexity of emotional representation. Hence, this article focuses on multimodal speech emotion recognition that analyzes the speaker's sentiment state via audio signals and textual content. Existing multimodal approaches utilize sequential networks to capture the temporal dependency in various feature sequences, ignoring the underlying relations in acoustic and textual modalities. Moreover, current feature-level and decision-level fusion methods have unresolved limitations. Therefore, this paper develops a novel multimodal fusion graph convolutional network that comprehensively executes information interactions within and between the two modalities. Specifically, we construct the intra-modal relations to excavate exclusive intrinsic characteristics in each modality. For the inter-modal fusion, a multi-perspective fusion mechanism is devised to integrate the complementary information between the two modalities. Substantial experiments on the IEMOCAP and RAVDESS datasets and experimental results demonstrate that our approach achieves superior performance. © 2024
引用
收藏
相关论文
共 50 条
  • [41] Multimodal emotion recognition from facial expression and speech based on feature fusion
    Tang, Guichen
    Xie, Yue
    Li, Ke
    Liang, Ruiyu
    Zhao, Li
    [J]. MULTIMEDIA TOOLS AND APPLICATIONS, 2023, 82 (11) : 16359 - 16373
  • [42] Multimodal information fusion application to human emotion recognition from face and speech
    Mansoorizadeh, Muharram
    Charkari, Nasrollah Moghaddam
    [J]. MULTIMEDIA TOOLS AND APPLICATIONS, 2010, 49 (02) : 277 - 297
  • [43] Speech emotion recognition using multimodal feature fusion with machine learning approach
    Panda, Sandeep Kumar
    Jena, Ajay Kumar
    Panda, Mohit Ranjan
    Panda, Susmita
    [J]. MULTIMEDIA TOOLS AND APPLICATIONS, 2023, 82 (27) : 42763 - 42781
  • [44] Interpretable multimodal emotion recognition using hybrid fusion of speech and image data
    Puneet Kumar
    Sarthak Malik
    Balasubramanian Raman
    [J]. Multimedia Tools and Applications, 2024, 83 : 28373 - 28394
  • [45] Multimodal information fusion application to human emotion recognition from face and speech
    Muharram Mansoorizadeh
    Nasrollah Moghaddam Charkari
    [J]. Multimedia Tools and Applications, 2010, 49 : 277 - 297
  • [46] Interpretable multimodal emotion recognition using hybrid fusion of speech and image data
    Kumar, Puneet
    Malik, Sarthak
    Raman, Balasubramanian
    [J]. MULTIMEDIA TOOLS AND APPLICATIONS, 2024, 83 (10) : 28373 - 28394
  • [47] Multimodal emotion recognition from facial expression and speech based on feature fusion
    Guichen Tang
    Yue Xie
    Ke Li
    Ruiyu Liang
    Li Zhao
    [J]. Multimedia Tools and Applications, 2023, 82 : 16359 - 16373
  • [48] Speech Emotion Recognition from Spectrograms with Deep Convolutional Neural Network
    Badshah, Abdul Malik
    Ahmad, Jamil
    Rahim, Nasir
    Baik, Sung Wook
    [J]. 2017 INTERNATIONAL CONFERENCE ON PLATFORM TECHNOLOGY AND SERVICE (PLATCON), 2017, : 125 - 129
  • [49] Optimizing Speech Emotion Recognition with Hilbert Curve and convolutional neural network
    Yang, Zijun
    Zhou, Shi
    Zhang, Lifeng
    Serikawa, Seiichi
    [J]. Cognitive Robotics, 2024, 4 : 30 - 41
  • [50] Speech Emotion Recognition in Neurological Disorders Using Convolutional Neural Network
    Zisad, Sharif Noor
    Hossain, Mohammad Shahadat
    Andersson, Karl
    [J]. BRAIN INFORMATICS, BI 2020, 2020, 12241 : 287 - 296