Mutual Cross-Attention in Dyadic Fusion Networks for Audio-Video Emotion Recognition

被引:0
|
作者
Luo, Jiachen [1 ]
Phan, Huy [2 ]
Wang, Lin [1 ]
Reiss, Joshua [1 ]
机构
[1] Queen Mary Univ London, Ctr Digital Mus, London, England
[2] Amazon Alexa, Cambridge, MA USA
来源
2023 11TH INTERNATIONAL CONFERENCE ON AFFECTIVE COMPUTING AND INTELLIGENT INTERACTION WORKSHOPS AND DEMOS, ACIIW | 2023年
关键词
affective computing; modality fusion; attention mechanism; deep learning; FEATURES; DATABASES; MODELS;
D O I
10.1109/ACIIW59127.2023.10388147
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Multimodal emotion recognition is a challenging problem in the research fields of human-computer interaction and pattern recognition. How to efficiently find a common sub-space among the heterogeneous multimodal data is still an open problem for audio-video emotion recognition. In this work, we propose an attentive audio-video fusion network in an emotional dialogue system to learn attentive contextual dependency, speaker information, and the interaction of audio-video modalities. We employ pre-trained models, wav2vec, and Distract your Attention Network, to extract high-level audio and video representations, respectively. By using weighted fusion based on a cross-attention module, the cross-modality encoder focuses on the inter-modality relations and selectively captures effective information among the audio-video modality. Specifically, bidirectional gated recurrent unit models capture long-term contextual information, explore speaker influence, and learn intra- and inter-modal interactions of the audio and video modalities in a dynamic manner. We evaluate the approach on the MELD dataset, and the experimental results show that the proposed approach achieves state-of-the-art performance on the dataset.
引用
收藏
页数:7
相关论文
共 50 条
  • [11] Speech Emotion Recognition Using Dual-Stream Representation and Cross-Attention Fusion
    Yu, Shaode
    Meng, Jiajian
    Fan, Wenqing
    Chen, Ye
    Zhu, Bing
    Yu, Hang
    Xie, Yaoqin
    Sun, Qiuirui
    ELECTRONICS, 2024, 13 (11)
  • [12] Emotion Recognition from Large-Scale Video Clips with Cross-Attention and Hybrid Feature Weighting Neural Networks
    Zhou, Siwei
    Wu, Xuemei
    Jiang, Fan
    Huang, Qionghao
    Huang, Changqin
    INTERNATIONAL JOURNAL OF ENVIRONMENTAL RESEARCH AND PUBLIC HEALTH, 2023, 20 (02)
  • [13] Audio-Video Based Multimodal Emotion Recognition Using SVMs and Deep Learning
    Sun, Bo
    Xu, Qihua
    He, Jun
    Yu, Lejun
    Li, Liandong
    Wei, Qinglan
    PATTERN RECOGNITION (CCPR 2016), PT II, 2016, 663 : 621 - 631
  • [14] Multi-Modal Residual Perceptron Network for Audio-Video Emotion Recognition
    Chang, Xin
    Skarbek, Wladyslaw
    SENSORS, 2021, 21 (16)
  • [15] Emotion Recognition Using Fusion of Audio and Video Features
    Ortega, Juan D. S.
    Cardinal, Patrick
    Koerich, Alessandro L.
    2019 IEEE INTERNATIONAL CONFERENCE ON SYSTEMS, MAN AND CYBERNETICS (SMC), 2019, : 3847 - 3852
  • [16] Audio-Video based Emotion Recognition Using Minimum Cost Flow Algorithm
    Nguyen, Xuan-Bac
    Lee, Guee-Sang
    Kim, Soo-Hyung
    Yang, Hyung-Jeong
    2019 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION WORKSHOPS (ICCVW), 2019, : 3737 - 3741
  • [17] Holistic-Based Cross-Attention Modal Fusion Network for Video Sign Language Recognition
    Gao, Qing
    Hu, Jing
    Mai, Haixing
    Ju, Zhaojie
    IEEE TRANSACTIONS ON COMPUTATIONAL SOCIAL SYSTEMS, 2024,
  • [18] Feature Fusion Based on Mutual-Cross-Attention Mechanism for EEG Emotion Recognition
    Zhao, Yimin
    Gu, Jin
    MEDICAL IMAGE COMPUTING AND COMPUTER ASSISTED INTERVENTION - MICCAI 2024, PT XI, 2024, 15011 : 276 - 285
  • [19] Audio-Visual Attention Networks for Emotion Recognition
    Lee, Jiyoung
    Kim, Sunok
    Kim, Seungryong
    Sohn, Kwanghoon
    AVSU'18: PROCEEDINGS OF THE 2018 WORKSHOP ON AUDIO-VISUAL SCENE UNDERSTANDING FOR IMMERSIVE MULTIMEDIA, 2018, : 27 - 32
  • [20] CAST: Cross-Attention in Space and Time for Video Action Recognition
    Lee, Dongho
    Lee, Jongseo
    Choi, Jinwoo
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 36 (NEURIPS 2023), 2023,