Structure Aware Multi-Graph Network for Multi-Modal Emotion Recognition in Conversations

被引:0
|
作者
Zhang, Duzhen [1 ]
Chen, Feilong [1 ]
Chang, Jianlong [1 ]
Chen, Xiuyi [2 ]
Tian, Qi [1 ]
机构
[1] Huawei Technol, Cloud & AI, Shenzhen 518129, Peoples R China
[2] Baidu Inc, Beijing 100085, Peoples R China
关键词
Emotion recognition; Context modeling; Feature extraction; Visualization; Acoustics; Oral communication; Transformers; Structure learning; multi-graph network; dual-stream propagations; multi-modal fusion; emotion recognition in conversations;
D O I
10.1109/TMM.2023.3238314
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Multi-Modal Emotion Recognition in Conversations (MMERC) is an increasingly active research field that leverages multi-modal signals to understand the feelings behind each utterance. Modeling contextual interactions and multi-modal fusion lie at the heart of this field, with graph-based models recently being widely used for MMERC to capture global multi-modal contextual information. However, these models generally mix all modality representations in a single graph, and utterances in each modality are fully connected, potentially ignoring three problems: 1) the heterogeneity of the multi-modal context, 2) the redundancy of contextual information, and 3) over-smoothing of the graph networks. To address these problems, we propose a Structure Aware Multi-Graph Network (SAMGN) for MMERC. Specifically, we construct multiple modality-specific graphs to model the heterogeneity of the multi-modal context. Instead of fully connecting the utterances in each modality, we design a structure learning module that determines whether edges exist between the utterances. This module reduces redundancy by forcing each utterance to focus on the contextual ones that contribute to its emotion recognition, acting like a message propagating reducer to alleviate over-smoothing. Then, we develop the SAMGN via Dual-Stream Propagation (DSP), which contains two propagation streams, i.e., intra- and inter-modal, performed in parallel to aggregate the heterogeneous modality information from multi-graphs. DSP also contains a gating unit that adaptively integrates the co-occurrence information from the above two propagations for emotion recognition. Experiments on two popular MMERC datasets demonstrate that SAMGN achieves new State-Of-The-Art (SOTA) results.
引用
收藏
页码:3987 / 3997
页数:11
相关论文
共 50 条
  • [1] Multi-Modal Graph Interaction for Multi-Graph Convolution Network in Urban Spatiotemporal Forecasting
    Zhang, Lingyu
    Geng, Xu
    Qin, Zhiwei
    Wang, Hongjun
    Wang, Xiao
    Zhang, Ying
    Liang, Jian
    Wu, Guobin
    Song, Xuan
    Wang, Yunhai
    [J]. SUSTAINABILITY, 2022, 14 (19)
  • [2] Dynamic Confidence-Aware Multi-Modal Emotion Recognition
    Zhu, Qi
    Zheng, Chuhang
    Zhang, Zheng
    Shao, Wei
    Zhang, Daoqiang
    [J]. IEEE TRANSACTIONS ON AFFECTIVE COMPUTING, 2024, 15 (03) : 1358 - 1370
  • [3] Multi-modal Correlated Network for emotion recognition in speech
    Ren, Minjie
    Nie, Weizhi
    Liu, Anan
    Su, Yuting
    [J]. VISUAL INFORMATICS, 2019, 3 (03) : 150 - 155
  • [4] Semantic Alignment Network for Multi-Modal Emotion Recognition
    Hou, Mixiao
    Zhang, Zheng
    Liu, Chang
    Lu, Guangming
    [J]. IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2023, 33 (09) : 5318 - 5329
  • [5] Multi-modal fusion network with complementarity and importance for emotion recognition
    Liu, Shuai
    Gao, Peng
    Li, Yating
    Fu, Weina
    Ding, Weiping
    [J]. INFORMATION SCIENCES, 2023, 619 : 679 - 694
  • [6] Dense Attention Memory Network for Multi-modal emotion recognition
    Ma, Gailing
    Guo, Xiao
    [J]. 2022 5TH INTERNATIONAL CONFERENCE ON MACHINE LEARNING AND NATURAL LANGUAGE PROCESSING, MLNLP 2022, 2022, : 48 - 53
  • [7] Multi-modal Laughter Recognition in Video Conversations
    Escalera, Sergio
    Puertas, Eloi
    Radeva, Petia
    Pujol, Oriol
    [J]. 2009 IEEE COMPUTER SOCIETY CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION WORKSHOPS (CVPR WORKSHOPS 2009), VOLS 1 AND 2, 2009, : 869 - 874
  • [8] MAGDRA: A Multi-modal Attention Graph Network with Dynamic Routing-By-Agreement for multi-label emotion recognition
    Li, Xingye
    Liu, Jin
    Xie, Yurong
    Gong, Peizhu
    Zhang, Xiliang
    He, Huihua
    [J]. KNOWLEDGE-BASED SYSTEMS, 2024, 283
  • [9] Multi-modal Attention for Speech Emotion Recognition
    Pan, Zexu
    Luo, Zhaojie
    Yang, Jichen
    Li, Haizhou
    [J]. INTERSPEECH 2020, 2020, : 364 - 368
  • [10] Towards Efficient Multi-Modal Emotion Recognition
    Dobrisek, Simon
    Gajsek, Rok
    Mihelic, France
    Pavesic, Nikola
    Struc, Vitomir
    [J]. INTERNATIONAL JOURNAL OF ADVANCED ROBOTIC SYSTEMS, 2013, 10