A CROSS-ATTENTION EMOTION RECOGNITION ALGORITHM BASED ON AUDIO AND VIDEO MODALITIES

被引：0

作者：

Wu, Xiao ^{[1
]}

Mu, Xuan ^{[1
]}

Qi, Wen ^{[3
]}

Liu, Xiaorui ^{[1
,2
]}

机构：

[1] Qingdao Univ, Automat Sch, 308 Ningxia Rd, Qingdao 266000, Peoples R China

[2] Shandong Key Lab Ind Control, Qingdao 266071, Peoples R China

[3] South China Univ Technol, Sch Future Technol, Guangzhou 511436, Peoples R China

来源：

2024 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING WORKSHOPS, ICASSPW 2024 | 2024年

关键词：

multimodal; emotion recognition; parallel convolution; cross attention;

D O I：

10.1109/ICASSPW62465.2024.10626511

中图分类号：

O42 [声学];

学科分类号：

070206 ; 082403 ;

摘要：

In recent years, emotion recognition has received significant attention. In this paper, multimodal information, including speech and facial expressions, is adopted to realize human emotion classification. Firstly, we propose a speech recognition model based on the Parallel convolutional module (Pconv), and an expression emotion recognition model based on the improved Inception-ResnetV2 network. The recognize futures of speech and expression will be further fused by using a cross-attention module coordinated with Bidirectional Long Short-Term Memory (BiLSTM). The experimental results organized on CH-SIMS and CMU-MOSI datasets have demonstrated that the proposed algorithm achieves high recognition accuracy. Each component of this model could contribute to performance improvement in the fair way.

引用

页码：309 / 313

页数：5

共 50 条

[1] Mutual Cross-Attention in Dyadic Fusion Networks for Audio-Video Emotion Recognition
Luo, Jiachen
Phan, Huy
Wang, Lin
Reiss, Joshua
2023 11TH INTERNATIONAL CONFERENCE ON AFFECTIVE COMPUTING AND INTELLIGENT INTERACTION WORKSHOPS AND DEMOS, ACIIW, 2023,
[2] A Joint Cross-Attention Model for Audio-Visual Fusion in Dimensional Emotion Recognition
Praveen, R. Gnana
de Melo, Wheidima Carneiro
Ullah, Nasib
Aslam, Haseeb
Zeeshan, Osama
Denorme, Theo
Pedersoli, Marco
Koerich, Alessandro L.
Bacon, Simon
Cardinal, Patrick
Granger, Eric
2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION WORKSHOPS, CVPRW 2022, 2022, : 2485 - 2494
[3] Audio-Visual Fusion for Emotion Recognition in the Valence-Arousal Space Using Joint Cross-Attention
Praveen, R. Gnana
Cardinal, Patrick
Granger, Eric
IEEE TRANSACTIONS ON BIOMETRICS, BEHAVIOR, AND IDENTITY SCIENCE, 2023, 5 (03): : 360 - 373
[4] AVaTER: Fusing Audio, Visual, and Textual Modalities Using Cross-Modal Attention for Emotion Recognition
Das, Avishek
Sarma, Moumita Sen
Hoque, Mohammed Moshiul
Siddique, Nazmul
Dewan, M. Ali Akber
SENSORS, 2024, 24 (18)
[5] CAST: Cross-Attention in Space and Time for Video Action Recognition
Lee, Dongho
Lee, Jongseo
Choi, Jinwoo
ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 36 (NEURIPS 2023), 2023,
[6] Multimodal Cross-Attention Bayesian Network for Social News Emotion Recognition
Wang, Xinzhi
Li, Mengyue
Chang, Yudong
Luo, Xiangfeng
Yao, Yige
Li, Zhichao
2023 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS, IJCNN, 2023,
[7] IS CROSS-ATTENTION PREFERABLE TO SELF-ATTENTION FOR MULTI-MODAL EMOTION RECOGNITION?
Rajan, Vandana
Brutti, Alessio
Cavallaro, Andrea
2022 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2022, : 4693 - 4697
[8] Hierarchical LSTM-Based Audio and Video Emotion Recognition With Embedded Attention Mechanism
Liu Tianbao
Zhang Lingtao
Yu Wentao
Wei Dongchuan
Fan Yijun
LASER & OPTOELECTRONICS PROGRESS, 2021, 58 (02)
[9] Cross-Attention Transformer for Video Interpolation
Kim, Hannah Halin
Yu, Shuzhi
Yuan, Shuai
Tomasi, Carlo
COMPUTER VISION - ACCV 2022 WORKSHOPS, 2023, 13848 : 325 - 342
[10] MSER: Multimodal speech emotion recognition using cross-attention with deep fusion
Khan, Mustaqeem
Gueaieb, Wail
El Saddik, Abdulmotaleb
Kwon, Soonil
EXPERT SYSTEMS WITH APPLICATIONS, 2024, 245

← 1 2 3 4 5 →