MFGCN: Multimodal fusion graph convolutional network for speech emotion recognition

被引:0
|
作者
Qi, Xin [1 ]
Wen, Yujun [1 ]
Zhang, Pengzhou [1 ]
Huang, Heyan [2 ]
机构
[1] State Key Laboratory of Media Convergence and Communication, Communication University of China, Beijing,100024, China
[2] School of Computer Science and Technology, Beijing Institute of Technology, Beijing,100081, China
关键词
Emotion Recognition;
D O I
10.1016/j.neucom.2024.128646
中图分类号
学科分类号
摘要
Speech emotion recognition (SER) is challenging owing to the complexity of emotional representation. Hence, this article focuses on multimodal speech emotion recognition that analyzes the speaker's sentiment state via audio signals and textual content. Existing multimodal approaches utilize sequential networks to capture the temporal dependency in various feature sequences, ignoring the underlying relations in acoustic and textual modalities. Moreover, current feature-level and decision-level fusion methods have unresolved limitations. Therefore, this paper develops a novel multimodal fusion graph convolutional network that comprehensively executes information interactions within and between the two modalities. Specifically, we construct the intra-modal relations to excavate exclusive intrinsic characteristics in each modality. For the inter-modal fusion, a multi-perspective fusion mechanism is devised to integrate the complementary information between the two modalities. Substantial experiments on the IEMOCAP and RAVDESS datasets and experimental results demonstrate that our approach achieves superior performance. © 2024
引用
收藏
相关论文
共 50 条
  • [1] MFGCN: A Multimodal Fusion Graph Convolutional Network for Online Car-Hailing Demand Prediction
    Liao, Lyuchao
    Li, Ben
    Zou, Fumin
    Huang, Dejuan
    [J]. IEEE INTELLIGENT SYSTEMS, 2023, 38 (03) : 21 - 30
  • [2] MULTIMODAL EMOTION RECOGNITION WITH CAPSULE GRAPH CONVOLUTIONAL BASED REPRESENTATION FUSION
    Liu, Jiaxing
    Chen, Sen
    Wang, Longbiao
    Liu, Zhilei
    Fu, Yahui
    Guo, Lili
    Dang, Jianwu
    [J]. 2021 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP 2021), 2021, : 6339 - 6343
  • [3] Multimodal Emotion Recognition Using a Hierarchical Fusion Convolutional Neural Network
    Zhang, Yong
    Cheng, Cheng
    Zhang, Yidie
    [J]. IEEE ACCESS, 2021, 9 : 7943 - 7951
  • [4] Multi-loop graph convolutional network for multimodal conversational emotion recognition
    Ren, Minjie
    Huang, Xiangdong
    Li, Wenhui
    Liu, Jing
    [J]. JOURNAL OF VISUAL COMMUNICATION AND IMAGE REPRESENTATION, 2023, 94
  • [5] Multimodal EEG Emotion Recognition Based on the Attention Recurrent Graph Convolutional Network
    Chen, Jingxia
    Liu, Yang
    Xue, Wen
    Hu, Kailei
    Lin, Wentao
    [J]. INFORMATION, 2022, 13 (11)
  • [6] Multimodal speech emotion recognition and classification using convolutional neural network techniques
    A. Christy
    S. Vaithyasubramanian
    A. Jesudoss
    M. D. Anto Praveena
    [J]. International Journal of Speech Technology, 2020, 23 : 381 - 388
  • [7] Multimodal speech emotion recognition and classification using convolutional neural network techniques
    Christy, A.
    Vaithyasubramanian, S.
    Jesudoss, A.
    Praveena, M. D. Anto
    [J]. INTERNATIONAL JOURNAL OF SPEECH TECHNOLOGY, 2020, 23 (02) : 381 - 388
  • [8] AMGCN: An adaptive multi-graph convolutional network for speech emotion recognition
    Lian, Hailun
    Lu, Cheng
    Chang, Hongli
    Zhao, Yan
    Li, Sunan
    Li, Yang
    Zong, Yuan
    [J]. Speech Communication, 2025, 168
  • [9] GraphMFT: A graph network based multimodal fusion technique for emotion recognition in conversation
    Li, Jiang
    Wang, Xiaoping
    Lv, Guoqing
    Zeng, Zhigang
    [J]. NEUROCOMPUTING, 2023, 550
  • [10] GCFormer: A Graph Convolutional Transformer for Speech Emotion Recognition
    Gao, Yingxue
    Zhao, Huan
    Xiao, Yufeng
    Zhang, Zixing
    [J]. PROCEEDINGS OF THE 25TH INTERNATIONAL CONFERENCE ON MULTIMODAL INTERACTION, ICMI 2023, 2023, : 307 - 313