GraphCFC: A Directed Graph Based Cross-Modal Feature Complementation Approach for Multimodal Conversational Emotion Recognition

被引:22
|
作者
Li, Jiang [1 ,2 ]
Wang, Xiaoping [1 ,2 ]
Lv, Guoqing [1 ,2 ]
Zeng, Zhigang [1 ,2 ]
机构
[1] Huazhong Univ Sci & Technol, Educ Minist China, Sch Artificial Intelligence & Automat, Wuhan 430074, Peoples R China
[2] Huazhong Univ Sci & Technol, Educ Minist China, Key Lab Image Proc & Intelligent Control, Wuhan 430074, Peoples R China
基金
中国国家自然科学基金;
关键词
Emotion recognition in conversation; multimodal fusion; graph neural networks; cross-modal feature complementation;
D O I
10.1109/TMM.2023.3260635
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Emotion Recognition in Conversation (ERC) plays a significant part in Human-Computer Interaction (HCI) systems since it can provide empathetic services. Multimodal ERC can mitigate the drawbacks of uni-modal approaches. Recently, Graph Neural Networks (GNNs) have been widely used in a variety of fields due to their superior performance in relation modeling. In multimodal ERC, GNNs are capable of extracting both long-distance contextual information and inter-modal interactive information. Unfortunately, since existing methods such as MMGCN directly fuse multiple modalities, redundant information may be generated and diverse information may be lost. In this work, we present a directed Graph based Cross-modal Feature Complementation (GraphCFC) module that can efficiently model contextual and interactive information. GraphCFC alleviates the problem of heterogeneity gap in multimodal fusion by utilizing multiple subspace extractors and Pair-wise Cross-modal Complementary (PairCC) strategy. We extract various types of edges from the constructed graph for encoding, thus enabling GNNs to extract crucial contextual and interactive information more accurately when performing message passing. Furthermore, we design a GNN structure called GAT-MLP, which can provide a new unified network framework for multimodal learning. The experimental results on two benchmark datasets show that our GraphCFC outperforms the state-of-the-art (SOTA) approaches.
引用
收藏
页码:77 / 89
页数:13
相关论文
共 50 条
  • [41] FedCMD: A Federated Cross-modal Knowledge Distillation for Drivers' Emotion Recognition
    Bano, Saira
    Tonellotto, Nicola
    Cassara, Pietro
    Gotta, Alberto
    ACM TRANSACTIONS ON INTELLIGENT SYSTEMS AND TECHNOLOGY, 2024, 15 (03)
  • [42] Multimodal Fusion with Cross-Modal Attention for Action Recognition in Still Images
    Tsai, Jia-Hua
    Chu, Wei-Ta
    PROCEEDINGS OF THE 4TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA IN ASIA, MMASIA 2022, 2022,
  • [43] AUDIOVISUAL EMOTION RECOGNITION VIA CROSS-MODAL ASSOCIATION IN KERNEL SPACE
    Wang, Yongjin
    Guan, Ling
    Venetsanopoulos, A. N.
    2011 IEEE INTERNATIONAL CONFERENCE ON MULTIMEDIA AND EXPO (ICME), 2011,
  • [44] Mi-CGA: Cross-modal Graph Attention Network for robust emotion recognition in the presence of incomplete modalities
    Nguyen, Cam-Van Thi
    Kieu, Hai-Dang
    Ha, Quang-Thuy
    Phan, Xuan-Hieu
    Le, Duc-Trong
    NEUROCOMPUTING, 2025, 623
  • [45] 'What' and 'Where' both matter: dual cross-modal graph convolutional networks for multimodal named entity recognition
    Zhang, Zhengxuan
    Chen, Jianying
    Liu, Xuejie
    Mai, Weixing
    Cai, Qianhua
    INTERNATIONAL JOURNAL OF MACHINE LEARNING AND CYBERNETICS, 2024, 15 (06) : 2399 - 2409
  • [46] Multimodal Emotion Recognition Using Feature Fusion: An LLM-Based Approach
    Chandraumakantham, Omkumar
    Gowtham, N.
    Zakariah, Mohammed
    Almazyad, Abdulaziz
    IEEE ACCESS, 2024, 12 : 108052 - 108071
  • [47] Knowledge graph embedding by fusing multimodal content via cross-modal learning
    Liu, Shi
    Li, Kaiyang
    Wang, Yaoying
    Zhu, Tianyou
    Li, Jiwei
    Chen, Zhenyu
    MATHEMATICAL BIOSCIENCES AND ENGINEERING, 2023, 20 (08) : 14180 - 14200
  • [48] Multi-corpus emotion recognition method based on cross-modal gated attention fusion
    Ryumina, Elena
    Ryumin, Dmitry
    Axyonov, Alexandr
    Ivanko, Denis
    Karpov, Alexey
    PATTERN RECOGNITION LETTERS, 2025, 190 : 192 - 200
  • [49] Large Margin Coupled Feature Learning for Cross-Modal Face Recognition
    Jin, Yi
    Lu, Jiwen
    Ruan, Qiuqi
    2015 INTERNATIONAL CONFERENCE ON BIOMETRICS (ICB), 2015, : 286 - 292
  • [50] Conversational Speech Recognition by Learning Audio-Textual Cross-Modal Contextual Representation
    Wei, Kun
    Li, Bei
    Lv, Hang
    Lu, Quan
    Jiang, Ning
    Xie, Lei
    IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2024, 32 (2432-2444) : 2432 - 2444