MFGCN: Multimodal fusion graph convolutional network for speech emotion recognition

被引：0

作者：

Qi, Xin ^{[1
]}

Wen, Yujun ^{[1
]}

Zhang, Pengzhou ^{[1
]}

Huang, Heyan ^{[2
]}

机构：

[1] State Key Laboratory of Media Convergence and Communication, Communication University of China, Beijing,100024, China

[2] School of Computer Science and Technology, Beijing Institute of Technology, Beijing,100081, China

来源：

Neurocomputing | 2025年 / 611卷

关键词：

Emotion Recognition;

D O I：

10.1016/j.neucom.2024.128646

中图分类号：

学科分类号：

摘要：

Speech emotion recognition (SER) is challenging owing to the complexity of emotional representation. Hence, this article focuses on multimodal speech emotion recognition that analyzes the speaker's sentiment state via audio signals and textual content. Existing multimodal approaches utilize sequential networks to capture the temporal dependency in various feature sequences, ignoring the underlying relations in acoustic and textual modalities. Moreover, current feature-level and decision-level fusion methods have unresolved limitations. Therefore, this paper develops a novel multimodal fusion graph convolutional network that comprehensively executes information interactions within and between the two modalities. Specifically, we construct the intra-modal relations to excavate exclusive intrinsic characteristics in each modality. For the inter-modal fusion, a multi-perspective fusion mechanism is devised to integrate the complementary information between the two modalities. Substantial experiments on the IEMOCAP and RAVDESS datasets and experimental results demonstrate that our approach achieves superior performance. © 2024

引用

共 50 条

[21] Multimodal Emotion Recognition Based on Ensemble Convolutional Neural Network
Huang, Haiping
Hu, Zhenchao
Wang, Wenming
Wu, Min
[J]. IEEE ACCESS, 2020, 8 : 3265 - 3271
[22] Deep and shallow features fusion based on deep convolutional neural network for speech emotion recognition
Sun L.
Chen J.
Xie K.
Gu T.
[J]. International Journal of Speech Technology, 2018, 21 (04) : 931 - 940
[23] Temporal Relation Inference Network for Multimodal Speech Emotion Recognition
Dong, Guan-Nan
Pun, Chi-Man
Zhang, Zheng
[J]. IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2022, 32 (09) : 6472 - 6485
[24] Adaptive Hierarchical Graph Convolutional Network for EEG Emotion Recognition
Xue, Yunlong
Zheng, Wenming
Zong, Yuan
Chang, Hongli
Jiang, Xingxun
[J]. 2022 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2022,
[25] PGCN: Pyramidal Graph Convolutional Network for EEG Emotion Recognition
Jin, Ming
Du, Changde
He, Huiguang
Cai, Ting
Li, Jinpeng
[J]. IEEE TRANSACTIONS ON MULTIMEDIA, 2024, 26 : 9070 - 9082
[26] An improved graph convolutional neural network for EEG emotion recognition
Xu, Bingyue
Zhang, Xin
Zhang, Xiu
Sun, Baiwei
Wang, Yujie
[J]. Neural Computing and Applications, 36 (36): : 23049 - 23060
[27] DialogueGCN: A Graph Convolutional Neural Network for Emotion Recognition in Conversation
Ghosal, Deepanway
Majumder, Navonil
Poria, Soujanya
Chhaya, Niyati
Gelbukh, Alexander
[J]. 2019 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING AND THE 9TH INTERNATIONAL JOINT CONFERENCE ON NATURAL LANGUAGE PROCESSING (EMNLP-IJCNLP 2019): PROCEEDINGS OF THE CONFERENCE, 2019, : 154 - 164
[28] Attention Based Fully Convolutional Network for Speech Emotion Recognition
Zhang, Yuanyuan
Du, Jun
Wang, Zirui
Zhang, Jianshu
Tu, Yanhui
[J]. 2018 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA ASC), 2018, : 1771 - 1775
[29] Speech Emotion Recognition based on Interactive Convolutional Neural Network
Cheng, Huihui
Tang, Xiaoyu
[J]. 2020 IEEE 3RD INTERNATIONAL CONFERENCE ON INFORMATION COMMUNICATION AND SIGNAL PROCESSING (ICICSP 2020), 2020, : 163 - 167
[30] Convolutional Attention Networks for Multimodal Emotion Recognition from Speech and Text Data
Lee, Chan Woo
Song, Kyu Ye
Jeong, Jihoon
Choi, Woo Yong
[J]. FIRST GRAND CHALLENGE AND WORKSHOP ON HUMAN MULTIMODAL LANGUAGE (CHALLENGE-HML), 2018, : 28 - 34

← 1 2 3 4 5 →