MFGCN: Multimodal fusion graph convolutional network for speech emotion recognition

被引:0
|
作者
Qi, Xin [1 ]
Wen, Yujun [1 ]
Zhang, Pengzhou [1 ]
Huang, Heyan [2 ]
机构
[1] State Key Laboratory of Media Convergence and Communication, Communication University of China, Beijing,100024, China
[2] School of Computer Science and Technology, Beijing Institute of Technology, Beijing,100081, China
关键词
Emotion Recognition;
D O I
10.1016/j.neucom.2024.128646
中图分类号
学科分类号
摘要
Speech emotion recognition (SER) is challenging owing to the complexity of emotional representation. Hence, this article focuses on multimodal speech emotion recognition that analyzes the speaker's sentiment state via audio signals and textual content. Existing multimodal approaches utilize sequential networks to capture the temporal dependency in various feature sequences, ignoring the underlying relations in acoustic and textual modalities. Moreover, current feature-level and decision-level fusion methods have unresolved limitations. Therefore, this paper develops a novel multimodal fusion graph convolutional network that comprehensively executes information interactions within and between the two modalities. Specifically, we construct the intra-modal relations to excavate exclusive intrinsic characteristics in each modality. For the inter-modal fusion, a multi-perspective fusion mechanism is devised to integrate the complementary information between the two modalities. Substantial experiments on the IEMOCAP and RAVDESS datasets and experimental results demonstrate that our approach achieves superior performance. © 2024
引用
收藏
相关论文
共 50 条
  • [21] Multimodal Emotion Recognition Based on Ensemble Convolutional Neural Network
    Huang, Haiping
    Hu, Zhenchao
    Wang, Wenming
    Wu, Min
    [J]. IEEE ACCESS, 2020, 8 : 3265 - 3271
  • [22] Deep and shallow features fusion based on deep convolutional neural network for speech emotion recognition
    Sun L.
    Chen J.
    Xie K.
    Gu T.
    [J]. International Journal of Speech Technology, 2018, 21 (04) : 931 - 940
  • [23] Temporal Relation Inference Network for Multimodal Speech Emotion Recognition
    Dong, Guan-Nan
    Pun, Chi-Man
    Zhang, Zheng
    [J]. IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2022, 32 (09) : 6472 - 6485
  • [24] Adaptive Hierarchical Graph Convolutional Network for EEG Emotion Recognition
    Xue, Yunlong
    Zheng, Wenming
    Zong, Yuan
    Chang, Hongli
    Jiang, Xingxun
    [J]. 2022 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2022,
  • [25] PGCN: Pyramidal Graph Convolutional Network for EEG Emotion Recognition
    Jin, Ming
    Du, Changde
    He, Huiguang
    Cai, Ting
    Li, Jinpeng
    [J]. IEEE TRANSACTIONS ON MULTIMEDIA, 2024, 26 : 9070 - 9082
  • [26] An improved graph convolutional neural network for EEG emotion recognition
    Xu, Bingyue
    Zhang, Xin
    Zhang, Xiu
    Sun, Baiwei
    Wang, Yujie
    [J]. Neural Computing and Applications, 36 (36): : 23049 - 23060
  • [27] DialogueGCN: A Graph Convolutional Neural Network for Emotion Recognition in Conversation
    Ghosal, Deepanway
    Majumder, Navonil
    Poria, Soujanya
    Chhaya, Niyati
    Gelbukh, Alexander
    [J]. 2019 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING AND THE 9TH INTERNATIONAL JOINT CONFERENCE ON NATURAL LANGUAGE PROCESSING (EMNLP-IJCNLP 2019): PROCEEDINGS OF THE CONFERENCE, 2019, : 154 - 164
  • [28] Attention Based Fully Convolutional Network for Speech Emotion Recognition
    Zhang, Yuanyuan
    Du, Jun
    Wang, Zirui
    Zhang, Jianshu
    Tu, Yanhui
    [J]. 2018 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA ASC), 2018, : 1771 - 1775
  • [29] Speech Emotion Recognition based on Interactive Convolutional Neural Network
    Cheng, Huihui
    Tang, Xiaoyu
    [J]. 2020 IEEE 3RD INTERNATIONAL CONFERENCE ON INFORMATION COMMUNICATION AND SIGNAL PROCESSING (ICICSP 2020), 2020, : 163 - 167
  • [30] Convolutional Attention Networks for Multimodal Emotion Recognition from Speech and Text Data
    Lee, Chan Woo
    Song, Kyu Ye
    Jeong, Jihoon
    Choi, Woo Yong
    [J]. FIRST GRAND CHALLENGE AND WORKSHOP ON HUMAN MULTIMODAL LANGUAGE (CHALLENGE-HML), 2018, : 28 - 34