Fusion with Hierarchical Graphs for Multimodal Emotion Recognition

被引:0
|
作者
Tang, Shuyun [1 ]
Luo, Zhaojie [2 ]
Nan, Guoshun [4 ]
Baba, Jun [3 ]
Yoshikawa, Yuichiro [2 ]
Ishiguro, Hiroshi [2 ]
机构
[1] Univ Calif Berkeley, Berkeley, CA USA
[2] Osaka Univ, Osaka, Japan
[3] CyberAgent Inc, Tokyo, Japan
[4] Beijing Univ Posts & Telecommun, Beijing, Peoples R China
关键词
DEEP;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Automatic emotion recognition (AER) based on enriched multimodal inputs, including text, speech, and visual clues, is crucial in the development of emotionally intelligent machines. Although complex modality relationships have been proven effective for AER, they are still largely underexplored because previous works predominantly relied on various fusion mechanisms with simply concatenated features to learn multimodal representations for emotion classification. This paper proposes a novel hierarchical fusion graph convolutional network (HFGCN) model that learns more informative multimodal representations by considering the modality dependencies during the feature fusion procedure. Specifically, the proposed model fuses multimodality inputs using a two-stage graph construction approach and encodes the modality dependencies into the conversation representation. We verified the interpretable capabilities of the proposed method by projecting the emotional states to a 2D valence-arousal (VA) subspace. Extensive experiments showed the effectiveness of our proposed model for more accurate AER, which yielded state-of-the-art results on two public datasets, IEMOCAP and MELD.
引用
收藏
页码:1288 / 1296
页数:9
相关论文
共 50 条
  • [1] Multimodal Emotion Recognition Using a Hierarchical Fusion Convolutional Neural Network
    Zhang, Yong
    Cheng, Cheng
    Zhang, Yidie
    [J]. IEEE ACCESS, 2021, 9 : 7943 - 7951
  • [2] Hierarchical Attention-Based Multimodal Fusion Network for Video Emotion Recognition
    Liu, Xiaodong
    Li, Songyang
    Wang, Miao
    [J]. COMPUTATIONAL INTELLIGENCE AND NEUROSCIENCE, 2021, 2021
  • [3] Multimodal emotion recognition with hierarchical memory networks
    Lai, Helang
    Wu, Keke
    Li, Lingli
    [J]. INTELLIGENT DATA ANALYSIS, 2021, 25 (04) : 1031 - 1045
  • [4] Multimodal Emotion Recognition Based on Feature Fusion
    Xu, Yurui
    Wu, Xiao
    Su, Hang
    Liu, Xiaorui
    [J]. 2022 INTERNATIONAL CONFERENCE ON ADVANCED ROBOTICS AND MECHATRONICS (ICARM 2022), 2022, : 7 - 11
  • [5] MULTIMODAL TRANSFORMER FUSION FOR CONTINUOUS EMOTION RECOGNITION
    Huang, Jian
    Tao, Jianhua
    Liu, Bin
    Lian, Zheng
    Niu, Mingyue
    [J]. 2020 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2020, : 3507 - 3511
  • [6] Hierarchical multimodal-fusion of physiological signals for emotion recognition with scenario adaption and contrastive alignment
    Tang, Jiehao
    Ma, Zhuang
    Gan, Kaiyu
    Zhang, Jianhua
    Yin, Zhong
    [J]. INFORMATION FUSION, 2024, 103
  • [7] Modeling Hierarchical Uncertainty for Multimodal Emotion Recognition in Conversation
    Chen, Feiyu
    Shao, Jie
    Zhu, Anjie
    Ouyang, Deqiang
    Liu, Xueliang
    Shen, Heng Tao
    [J]. IEEE TRANSACTIONS ON CYBERNETICS, 2024, 54 (01) : 187 - 198
  • [8] Emotion Recognition Based on Feedback Weighted Fusion of Multimodal Emotion Data
    Wei, Wei
    Jia, Qingxuan
    Feng, Yongli
    [J]. 2017 IEEE INTERNATIONAL CONFERENCE ON ROBOTICS AND BIOMIMETICS (IEEE ROBIO 2017), 2017, : 1682 - 1687
  • [9] A Framework to Evaluate Fusion Methods for Multimodal Emotion Recognition
    Pena, Diego
    Aguilera, Ana
    Dongo, Irvin
    Heredia, Juanpablo
    Cardinale, Yudith
    [J]. IEEE ACCESS, 2023, 11 : 10218 - 10237
  • [10] Multimodal transformer augmented fusion for speech emotion recognition
    Wang, Yuanyuan
    Gu, Yu
    Yin, Yifei
    Han, Yingping
    Zhang, He
    Wang, Shuang
    Li, Chenyu
    Quan, Dou
    [J]. FRONTIERS IN NEUROROBOTICS, 2023, 17