Fusion with Hierarchical Graphs for Multimodal Emotion Recognition

被引:0
|
作者
Tang, Shuyun [1 ]
Luo, Zhaojie [2 ]
Nan, Guoshun [4 ]
Baba, Jun [3 ]
Yoshikawa, Yuichiro [2 ]
Ishiguro, Hiroshi [2 ]
机构
[1] Univ Calif Berkeley, Berkeley, CA USA
[2] Osaka Univ, Osaka, Japan
[3] CyberAgent Inc, Tokyo, Japan
[4] Beijing Univ Posts & Telecommun, Beijing, Peoples R China
关键词
DEEP;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Automatic emotion recognition (AER) based on enriched multimodal inputs, including text, speech, and visual clues, is crucial in the development of emotionally intelligent machines. Although complex modality relationships have been proven effective for AER, they are still largely underexplored because previous works predominantly relied on various fusion mechanisms with simply concatenated features to learn multimodal representations for emotion classification. This paper proposes a novel hierarchical fusion graph convolutional network (HFGCN) model that learns more informative multimodal representations by considering the modality dependencies during the feature fusion procedure. Specifically, the proposed model fuses multimodality inputs using a two-stage graph construction approach and encodes the modality dependencies into the conversation representation. We verified the interpretable capabilities of the proposed method by projecting the emotional states to a 2D valence-arousal (VA) subspace. Extensive experiments showed the effectiveness of our proposed model for more accurate AER, which yielded state-of-the-art results on two public datasets, IEMOCAP and MELD.
引用
收藏
页码:1288 / 1296
页数:9
相关论文
共 50 条
  • [31] Emotion recognition based on brain-like multimodal hierarchical perception
    Zhu X.
    Huang Y.
    Wang X.
    Wang R.
    [J]. Multimedia Tools and Applications, 2024, 83 (18) : 56039 - 56057
  • [32] A multimodal hierarchical approach to speech emotion recognition from audio and text
    Singh, Prabhav
    Srivastava, Ridam
    Rana, K. P. S.
    Kumar, Vineet
    [J]. KNOWLEDGE-BASED SYSTEMS, 2021, 229
  • [33] Semantics Fusion of Hierarchical Transformers for Multimodal Named Entity Recognition
    Tong, Zhao
    Liu, Qiang
    Shi, Haichao
    Xia, Yuwei
    Wu, Shu
    Zhang, Xiao-Yu
    [J]. ADVANCED INTELLIGENT COMPUTING TECHNOLOGY AND APPLICATIONS, PT III, ICIC 2024, 2024, 14877 : 414 - 426
  • [34] Research on Emotion Recognition Method of Flight Training Based on Multimodal Fusion
    Wang, Wendong
    Zhang, Haoyang
    Zhang, Zhibin
    [J]. INTERNATIONAL JOURNAL OF HUMAN-COMPUTER INTERACTION, 2023, 40 (20) : 6478 - 6491
  • [35] A review of multimodal emotion recognition from datasets, preprocessing, and fusion methods
    Pan, Bei
    Hirota, Kaoru
    Jia, Zhiyang
    Dai, Yaping
    [J]. NEUROCOMPUTING, 2023, 561
  • [36] Incongruity-aware multimodal physiology signals fusion for emotion recognition
    Li, Jing
    Chen, Ning
    Zhu, Hongqing
    Li, Guangqiang
    Xu, Zhangyong
    Chen, Dingxin
    [J]. INFORMATION FUSION, 2024, 105
  • [37] Improving multimodal fusion with Main Modal Transformer for emotion recognition in conversation
    Zou, ShiHao
    Huang, Xianying
    Shen, XuDong
    Liu, Hankai
    [J]. KNOWLEDGE-BASED SYSTEMS, 2022, 258
  • [38] Revisiting Disentanglement and Fusion on Modality and Context in Conversational Multimodal Emotion Recognition
    Li, Bobo
    Fei, Hao
    Liao, Lizi
    Zhao, Yu
    Teng, Chong
    Chua, Tat-Seng
    Ji, Donghong
    Li, Fei
    [J]. PROCEEDINGS OF THE 31ST ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2023, 2023, : 5923 - 5934
  • [39] A Hybrid Latent Space Data Fusion Method for Multimodal Emotion Recognition
    Nemati, Shahla
    Rohani, Reza
    Basiri, Mohammad Ehsan
    Abdar, Moloud
    Yen, Neil Y.
    Makarenkov, Vladimir
    [J]. IEEE ACCESS, 2019, 7 : 172948 - 172964
  • [40] Combining Multimodal Features within a Fusion Network for Emotion Recognition in the Wild
    Sun, Bo
    Li, Liandong
    Zhou, Guoyan
    Wu, Xuewen
    He, Jun
    Yu, Lejun
    Li, Dongxue
    Wei, Qinglan
    [J]. ICMI'15: PROCEEDINGS OF THE 2015 ACM INTERNATIONAL CONFERENCE ON MULTIMODAL INTERACTION, 2015, : 497 - 502