Group Gated Fusion on Attention-based Bidirectional Alignment for Multimodal Emotion Recognition

被引:15
|
作者
Liu, Pengfei [1 ]
Li, Kun [1 ]
Meng, Helen [2 ]
机构
[1] SpeechX Ltd, Shenzhen, Peoples R China
[2] Chinese Univ Hong Kong, Hong Kong, Peoples R China
来源
关键词
multimodal emotion recognition; attention models; information fusion; NEURAL-NETWORKS; FEATURES;
D O I
10.21437/Interspeech.2020-2067
中图分类号
R36 [病理学]; R76 [耳鼻咽喉科学];
学科分类号
100104 ; 100213 ;
摘要
Emotion recognition is a challenging and actively-studied research area that plays a critical role in emotion-aware human-computer interaction systems. In a multimodal setting, temporal alignment between different modalities has not been well investigated yet. This paper presents a new model named as Gated Bidirectional Alignment Network (GBAN), which consists of an attention-based bidirectional alignment network over LSTM hidden states to explicitly capture the alignment relationship between speech and text, and a novel group gated fusion (GGF) layer to integrate the representations of different modalities. We empirically show that the attention-aligned representations outperform the last-hidden-states of LSTM significantly, and the proposed GBAN model outperforms existing state-of-the-art multimodal approaches on the IEMOCAP dataset.
引用
收藏
页码:379 / 383
页数:5
相关论文
共 50 条
  • [1] Hierarchical Attention-Based Multimodal Fusion Network for Video Emotion Recognition
    Liu, Xiaodong
    Li, Songyang
    Wang, Miao
    [J]. COMPUTATIONAL INTELLIGENCE AND NEUROSCIENCE, 2021, 2021
  • [2] Attention-based multimodal contextual fusion for sentiment and emotion classification using bidirectional LSTM
    Huddar, Mahesh G.
    Sannakki, Sanjeev S.
    Rajpurohit, Vijay S.
    [J]. MULTIMEDIA TOOLS AND APPLICATIONS, 2021, 80 (09) : 13059 - 13076
  • [3] Attention-based multimodal contextual fusion for sentiment and emotion classification using bidirectional LSTM
    Mahesh G. Huddar
    Sanjeev S. Sannakki
    Vijay S. Rajpurohit
    [J]. Multimedia Tools and Applications, 2021, 80 : 13059 - 13076
  • [4] MultiEMO: An Attention-Based Correlation-Aware Multimodal Fusion Framework for Emotion Recognition in Conversations
    Shi, Tao
    Huang, Shao-Lun
    [J]. PROCEEDINGS OF THE 61ST ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2023): LONG PAPERS, VOL 1, 2023, : 14752 - 14766
  • [5] Attention-Based Multimodal Fusion for Video Description
    Hori, Chiori
    Hori, Takaaki
    Lee, Teng-Yok
    Zhang, Ziming
    Harsham, Bret
    Hershey, John R.
    Marks, Tim K.
    Sumi, Kazuhiko
    [J]. 2017 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV), 2017, : 4203 - 4212
  • [6] A NOVEL ATTENTION-BASED GATED RECURRENT UNIT AND ITS EFFICACY IN SPEECH EMOTION RECOGNITION
    Rajamani, Srividya Tirunellai
    Rajamani, Kumar T.
    Mallol-Ragolta, Adria
    Liu, Shuo
    Schuller, Bjoern
    [J]. 2021 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP 2021), 2021, : 6294 - 6298
  • [7] Attention-Based Multimodal Fusion for Estimating Human Emotion in Real-World HRI
    Li, Yuanchao
    Zhao, Tianyu
    Shen, Xun
    [J]. HRI'20: COMPANION OF THE 2020 ACM/IEEE INTERNATIONAL CONFERENCE ON HUMAN-ROBOT INTERACTION, 2020, : 340 - 342
  • [8] Multimodal Emotion Detection via Attention-Based Fusion of Extracted Facial and Speech Features
    Mamieva, Dilnoza
    Abdusalomov, Akmalbek Bobomirzaevich
    Kutlimuratov, Alpamis
    Muminov, Bahodir
    Whangbo, Taeg Keun
    [J]. SENSORS, 2023, 23 (12)
  • [9] Attention-based multimodal sentiment analysis and emotion recognition using deep neural networks
    Aslam, Ajwa
    Sargano, Allah Bux
    Habib, Zulfiqar
    [J]. APPLIED SOFT COMPUTING, 2023, 144
  • [10] Expression EEG Multimodal Emotion Recognition Method Based on the Bidirectional LSTM and Attention Mechanism
    Zhao, Yifeng
    Chen, Deyun
    [J]. COMPUTATIONAL AND MATHEMATICAL METHODS IN MEDICINE, 2021, 2021