GraphCFC: A Directed Graph Based Cross-Modal Feature Complementation Approach for Multimodal Conversational Emotion Recognition

被引：22

作者：

Li, Jiang ^{[1
,2
]}

Wang, Xiaoping ^{[1
,2
]}

Lv, Guoqing ^{[1
,2
]}

Zeng, Zhigang ^{[1
,2
]}

机构：

[1] Huazhong Univ Sci & Technol, Educ Minist China, Sch Artificial Intelligence & Automat, Wuhan 430074, Peoples R China

[2] Huazhong Univ Sci & Technol, Educ Minist China, Key Lab Image Proc & Intelligent Control, Wuhan 430074, Peoples R China

来源：

IEEE TRANSACTIONS ON MULTIMEDIA | 2024年 / 26卷

基金：

中国国家自然科学基金;

关键词：

Emotion recognition in conversation; multimodal fusion; graph neural networks; cross-modal feature complementation;

D O I：

10.1109/TMM.2023.3260635

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

Emotion Recognition in Conversation (ERC) plays a significant part in Human-Computer Interaction (HCI) systems since it can provide empathetic services. Multimodal ERC can mitigate the drawbacks of uni-modal approaches. Recently, Graph Neural Networks (GNNs) have been widely used in a variety of fields due to their superior performance in relation modeling. In multimodal ERC, GNNs are capable of extracting both long-distance contextual information and inter-modal interactive information. Unfortunately, since existing methods such as MMGCN directly fuse multiple modalities, redundant information may be generated and diverse information may be lost. In this work, we present a directed Graph based Cross-modal Feature Complementation (GraphCFC) module that can efficiently model contextual and interactive information. GraphCFC alleviates the problem of heterogeneity gap in multimodal fusion by utilizing multiple subspace extractors and Pair-wise Cross-modal Complementary (PairCC) strategy. We extract various types of edges from the constructed graph for encoding, thus enabling GNNs to extract crucial contextual and interactive information more accurately when performing message passing. Furthermore, we design a GNN structure called GAT-MLP, which can provide a new unified network framework for multimodal learning. The experimental results on two benchmark datasets show that our GraphCFC outperforms the state-of-the-art (SOTA) approaches.

引用

页码：77 / 89

页数：13

共 50 条

[41] FedCMD: A Federated Cross-modal Knowledge Distillation for Drivers' Emotion Recognition
Bano, Saira
Tonellotto, Nicola
Cassara, Pietro
Gotta, Alberto
ACM TRANSACTIONS ON INTELLIGENT SYSTEMS AND TECHNOLOGY, 2024, 15 (03)
[42] Multimodal Fusion with Cross-Modal Attention for Action Recognition in Still Images
Tsai, Jia-Hua
Chu, Wei-Ta
PROCEEDINGS OF THE 4TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA IN ASIA, MMASIA 2022, 2022,
[43] AUDIOVISUAL EMOTION RECOGNITION VIA CROSS-MODAL ASSOCIATION IN KERNEL SPACE
Wang, Yongjin
Guan, Ling
Venetsanopoulos, A. N.
2011 IEEE INTERNATIONAL CONFERENCE ON MULTIMEDIA AND EXPO (ICME), 2011,
[44] Mi-CGA: Cross-modal Graph Attention Network for robust emotion recognition in the presence of incomplete modalities
Nguyen, Cam-Van Thi
Kieu, Hai-Dang
Ha, Quang-Thuy
Phan, Xuan-Hieu
Le, Duc-Trong
NEUROCOMPUTING, 2025, 623
[45] 'What' and 'Where' both matter: dual cross-modal graph convolutional networks for multimodal named entity recognition
Zhang, Zhengxuan
Chen, Jianying
Liu, Xuejie
Mai, Weixing
Cai, Qianhua
INTERNATIONAL JOURNAL OF MACHINE LEARNING AND CYBERNETICS, 2024, 15 (06) : 2399 - 2409
[46] Multimodal Emotion Recognition Using Feature Fusion: An LLM-Based Approach
Chandraumakantham, Omkumar
Gowtham, N.
Zakariah, Mohammed
Almazyad, Abdulaziz
IEEE ACCESS, 2024, 12 : 108052 - 108071
[47] Knowledge graph embedding by fusing multimodal content via cross-modal learning
Liu, Shi
Li, Kaiyang
Wang, Yaoying
Zhu, Tianyou
Li, Jiwei
Chen, Zhenyu
MATHEMATICAL BIOSCIENCES AND ENGINEERING, 2023, 20 (08) : 14180 - 14200
[48] Multi-corpus emotion recognition method based on cross-modal gated attention fusion
Ryumina, Elena
Ryumin, Dmitry
Axyonov, Alexandr
Ivanko, Denis
Karpov, Alexey
PATTERN RECOGNITION LETTERS, 2025, 190 : 192 - 200
[49] Large Margin Coupled Feature Learning for Cross-Modal Face Recognition
Jin, Yi
Lu, Jiwen
Ruan, Qiuqi
2015 INTERNATIONAL CONFERENCE ON BIOMETRICS (ICB), 2015, : 286 - 292
[50] Conversational Speech Recognition by Learning Audio-Textual Cross-Modal Contextual Representation
Wei, Kun
Li, Bei
Lv, Hang
Lu, Quan
Jiang, Ning
Xie, Lei
IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2024, 32 (2432-2444) : 2432 - 2444

← 1 2 3 4 5 →