GraphMFT: A graph network based multimodal fusion technique for emotion recognition in conversation

被引:9
|
作者
Li, Jiang [1 ,2 ,3 ]
Wang, Xiaoping [1 ,2 ,3 ]
Lv, Guoqing [1 ,2 ,3 ]
Zeng, Zhigang [1 ,2 ,3 ]
机构
[1] Huazhong Univ Sci & Technol, Sch Artificial Intelligence & Automat, Wuhan 430074, Peoples R China
[2] Huazhong Univ Sci & Technol, Key Lab Image Proc & Intelligent Control, Educ Minist China, Wuhan 430074, Peoples R China
[3] Huazhong Univ Sci & Technol, Hubei Key Lab Brain Inspired Intelligent Syst, Wuhan 430074, Peoples R China
基金
中国国家自然科学基金;
关键词
Multimodal machine learning; Graph neural networks; Emotion recognition in conversation; Multimodal fusion;
D O I
10.1016/j.neucom.2023.126427
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Multimodal machine learning is an emerging area of research, which has received a great deal of scholarly attention in recent years. Up to now, there are few studies on multimodal Emotion Recognition in Conversation (ERC). Since Graph Neural Networks (GNNs) possess the powerful capacity of relational modeling, they have an inherent advantage in the field of multimodal learning. GNNs leverage the graph constructed from multimodal data to perform intra- and inter-modal information interaction, which effectively facilitates the integration and complementation of multimodal data. In this work, we propose a novel Graph network based Multimodal Fusion Technique (GraphMFT) for emotion recognition in conversation. Multimodal data can be modeled as a graph, where each data object is regarded as a node, and both intra- and inter-modal dependencies existing between data objects can be regarded as edges. GraphMFT utilizes multiple improved graph attention networks to capture intra-modal contextual information and inter-modal complementary information. In addition, the proposed GraphMFT attempts to address the challenges of existing graph-based multimodal conversational emotion recognition models such as MMGCN. Empirical results on two public multimodal datasets reveal that our model outperforms the State-Of-The-Art (SOTA) approaches with the accuracy of 67.90% and 61.30%. & COPY; 2023 Elsevier B.V. All rights reserved.
引用
收藏
页数:11
相关论文
共 50 条
  • [1] Bi-stream graph learning based multimodal fusion for emotion recognition in conversation
    Lu, Nannan
    Han, Zhiyuan
    Han, Min
    Qian, Jiansheng
    [J]. INFORMATION FUSION, 2024, 106
  • [2] Multimodal Decoupled Distillation Graph Neural Network for Emotion Recognition in Conversation
    Dai, Yijing
    Li, Yingjian
    Chen, Dongpeng
    Li, Jinxing
    Lu, Guangming
    [J]. IEEE Transactions on Circuits and Systems for Video Technology, 2024, 34 (10) : 9910 - 9924
  • [3] MMDAG: Multimodal Directed Acyclic Graph Network for Emotion Recognition in Conversation
    Xu, Shuo
    Jia, Yuxiang
    Niu, Changyong
    Zan, Hongying
    [J]. LREC 2022: THIRTEEN INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION, 2022, : 6802 - 6807
  • [4] MFGCN: Multimodal fusion graph convolutional network for speech emotion recognition
    Qi, Xin
    Wen, Yujun
    Zhang, Pengzhou
    Huang, Heyan
    [J]. Neurocomputing, 2025, 611
  • [5] HiMul-LGG: A hierarchical decision fusion-based local–global graph neural network for multimodal emotion recognition in conversation
    Fu, Changzeng
    Qian, Fengkui
    Su, Kaifeng
    Su, Yikai
    Wang, Ze
    Shi, Jiaqi
    Liu, Zhigang
    Liu, Chaoran
    Ishi, Carlos Toshinori
    [J]. Neural Networks, 2025, 181
  • [6] A cross-modal fusion network based on graph feature learning for multimodal emotion recognition
    Cao Xiaopeng
    Zhang Linying
    Chen Qiuxian
    Ning Hailong
    Dong Yizhuo
    [J]. The Journal of China Universities of Posts and Telecommunications, 2024, 31 (06) : 16 - 25
  • [7] MULTIMODAL EMOTION RECOGNITION WITH CAPSULE GRAPH CONVOLUTIONAL BASED REPRESENTATION FUSION
    Liu, Jiaxing
    Chen, Sen
    Wang, Longbiao
    Liu, Zhilei
    Fu, Yahui
    Guo, Lili
    Dang, Jianwu
    [J]. 2021 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP 2021), 2021, : 6339 - 6343
  • [8] A Contextual Attention Network for Multimodal Emotion Recognition in Conversation
    Wang, Tana
    Hou, Yaqing
    Zhou, Dongsheng
    Zhang, Qiang
    [J]. 2021 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2021,
  • [9] Interactive Multimodal Attention Network for Emotion Recognition in Conversation
    Ren, Minjie
    Huang, Xiangdong
    Shi, Xiaoqi
    Nie, Weizhi
    [J]. IEEE SIGNAL PROCESSING LETTERS, 2021, 28 : 1046 - 1050
  • [10] Multimodal Emotion Recognition in Conversation Based on Hypergraphs
    Li, Jiaze
    Mei, Hongyan
    Jia, Liyun
    Zhang, Xing
    [J]. ELECTRONICS, 2023, 12 (22)