Mutual Cross-Attention in Dyadic Fusion Networks for Audio-Video Emotion Recognition

被引：0

作者：

Luo, Jiachen ^{[1
]}

Phan, Huy ^{[2
]}

Wang, Lin ^{[1
]}

Reiss, Joshua ^{[1
]}

机构：

[1] Queen Mary Univ London, Ctr Digital Mus, London, England

[2] Amazon Alexa, Cambridge, MA USA

来源：

2023 11TH INTERNATIONAL CONFERENCE ON AFFECTIVE COMPUTING AND INTELLIGENT INTERACTION WORKSHOPS AND DEMOS, ACIIW | 2023年

关键词：

affective computing; modality fusion; attention mechanism; deep learning; FEATURES; DATABASES; MODELS;

D O I：

10.1109/ACIIW59127.2023.10388147

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Multimodal emotion recognition is a challenging problem in the research fields of human-computer interaction and pattern recognition. How to efficiently find a common sub-space among the heterogeneous multimodal data is still an open problem for audio-video emotion recognition. In this work, we propose an attentive audio-video fusion network in an emotional dialogue system to learn attentive contextual dependency, speaker information, and the interaction of audio-video modalities. We employ pre-trained models, wav2vec, and Distract your Attention Network, to extract high-level audio and video representations, respectively. By using weighted fusion based on a cross-attention module, the cross-modality encoder focuses on the inter-modality relations and selectively captures effective information among the audio-video modality. Specifically, bidirectional gated recurrent unit models capture long-term contextual information, explore speaker influence, and learn intra- and inter-modal interactions of the audio and video modalities in a dynamic manner. We evaluate the approach on the MELD dataset, and the experimental results show that the proposed approach achieves state-of-the-art performance on the dataset.

引用

页数：7

共 50 条

[1] Audio-Video Fusion with Double Attention for Multimodal Emotion Recognition
Mocanu, Bogdan
Tapu, Ruxandra
2022 IEEE 14TH IMAGE, VIDEO, AND MULTIDIMENSIONAL SIGNAL PROCESSING WORKSHOP (IVMSP), 2022,
[2] A CROSS-ATTENTION EMOTION RECOGNITION ALGORITHM BASED ON AUDIO AND VIDEO MODALITIES
Wu, Xiao
Mu, Xuan
Qi, Wen
Liu, Xiaorui
2024 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING WORKSHOPS, ICASSPW 2024, 2024, : 309 - 313
[3] Active Speaker Recognition using Cross Attention Audio-Video Fusion
Mocanu, Bogdan
Tapu, Ruxandra
2022 10TH EUROPEAN WORKSHOP ON VISUAL INFORMATION PROCESSING (EUVIP), 2022,
[4] Multimodal emotion recognition using cross modal audio-video fusion with attention and deep metric learning
Mocanu, Bogdan
Tapu, Ruxandra
Zaharia, Titus
IMAGE AND VISION COMPUTING, 2023, 133
[5] Deep Fusion: An Attention Guided Factorized Bilinear Pooling for Audio-video Emotion Recognition
Zhang, Yuanyuan
Wang, Zi-Rui
Du, Jun
2019 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2019,
[6] Exploring Emotion Features and Fusion Strategies for Audio-Video Emotion Recognition
Zhou, Hengshun
Meng, Debin
Zhang, Yuanyuan
Peng, Xiaojiang
Du, Jun
Wang, Kai
Qiao, Yu
ICMI'19: PROCEEDINGS OF THE 2019 INTERNATIONAL CONFERENCE ON MULTIMODAL INTERACTION, 2019, : 562 - 566
[7] A Joint Cross-Attention Model for Audio-Visual Fusion in Dimensional Emotion Recognition
Praveen, R. Gnana
de Melo, Wheidima Carneiro
Ullah, Nasib
Aslam, Haseeb
Zeeshan, Osama
Denorme, Theo
Pedersoli, Marco
Koerich, Alessandro L.
Bacon, Simon
Cardinal, Patrick
Granger, Eric
2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION WORKSHOPS, CVPRW 2022, 2022, : 2485 - 2494
[8] Audio-Visual Fusion for Emotion Recognition in the Valence-Arousal Space Using Joint Cross-Attention
Praveen, R. Gnana
Cardinal, Patrick
Granger, Eric
IEEE TRANSACTIONS ON BIOMETRICS, BEHAVIOR, AND IDENTITY SCIENCE, 2023, 5 (03): : 360 - 373
[9] MSER: Multimodal speech emotion recognition using cross-attention with deep fusion
Khan, Mustaqeem
Gueaieb, Wail
El Saddik, Abdulmotaleb
Kwon, Soonil
EXPERT SYSTEMS WITH APPLICATIONS, 2024, 245
[10] Mutual Correlation Attentive Factors in Dyadic Fusion Networks for Speech Emotion Recognition
Gu, Yue
Lyu, Xinyu
Sun, Weijia
Li, Weitian
Chen, Shuhong
Li, Xinyu
Ivan, Marsic
PROCEEDINGS OF THE 27TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA (MM'19), 2019, : 157 - 165

← 1 2 3 4 5 →