Emotion Recognition in Video Streams Using Intramodal and Intermodal Attention Mechanisms

被引：0

作者：

Mocanu, Bogdan ^{[2
]}

Tapu, Ruxandra ^{[1
,2
]}

机构：

[1] Inst Polytech Paris, ARTEMIS Dept, Telecom SudParis, 9 Rue Charles Fourier, F-91000 Evry, France

[2] Univ Politehn Bucuresti, Fac ETTI, Dept Telecommun, Bucharest, Romania

来源：

ADVANCES IN VISUAL COMPUTING, ISVC 2022, PT II | 2022年 / 13599卷

关键词：

Cross-modal emotion recognition; Self-attention; Spatial/channel and temporal attention; Audio and video fusion;

D O I：

10.1007/978-3-031-20716-7_23

中图分类号：

TP31 [计算机软件];

学科分类号：

081202 ; 0835 ;

摘要：

Automatic emotion recognition from video streams is an essential challenge for various applications including human behavior understanding, mental disease diagnosis, surveillance, or human-machine interaction. In this paper we introduce a novel, completely automatic, multimodal emotion recognition framework based on audio and visual fusion of information designed to leverage the mutually complementary nature of features while maintaining the modality-distinctive information. Specifically, we integrate the spatial, channel and temporal attention into the visual processing pipeline and the temporal self-attention into the audio branch. Then, a multimodal cross-attention fusion strategy is introduced that effectively exploits the relationship between the audio and video features. The experimental evaluation performed on RAVDESS, a publicly available database, validates the proposed approach with average accuracy scores superior to 87.85%. When compared with the state-of the art methods the proposed framework returns accuracy gains of more than 1.85%.

引用

页码：295 / 306

页数：12

共 50 条

[21] Hierarchical Attention-Based Multimodal Fusion Network for Video Emotion Recognition
Liu, Xiaodong
Li, Songyang
Wang, Miao
COMPUTATIONAL INTELLIGENCE AND NEUROSCIENCE, 2021, 2021
[22] A CROSS-ATTENTION EMOTION RECOGNITION ALGORITHM BASED ON AUDIO AND VIDEO MODALITIES
Wu, Xiao
Mu, Xuan
Qi, Wen
Liu, Xiaorui
2024 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING WORKSHOPS, ICASSPW 2024, 2024, : 309 - 313
[23] Ubiquitous Emotion Recognition Using Audio and Video Data
Jannat, Rahatul
Tynes, Iyonna
LaLime, Lott
Adorno, Juan
Canavan, Shaun
PROCEEDINGS OF THE 2018 ACM INTERNATIONAL JOINT CONFERENCE ON PERVASIVE AND UBIQUITOUS COMPUTING AND PROCEEDINGS OF THE 2018 ACM INTERNATIONAL SYMPOSIUM ON WEARABLE COMPUTERS (UBICOMP/ISWC'18 ADJUNCT), 2018, : 956 - 959
[24] Deep facial emotion recognition in video using eigenframes
Hajarolasvadi, Noushin
Demirel, Hasan
IET IMAGE PROCESSING, 2020, 14 (14) : 3536 - 3546
[25] Video multimodal emotion recognition based on Bi-GRU and attention fusion
Huan, Ruo-Hong
Shu, Jia
Bao, Sheng-Lin
Liang, Rong-Hua
Chen, Peng
Chi, Kai-Kai
MULTIMEDIA TOOLS AND APPLICATIONS, 2021, 80 (06) : 8213 - 8240
[26] Emotion Recognition using Linear Transformations in Combination with Video
Gajsek, Rok
Struc, Vitomir
Dobrisek, Simon
Mihelic, France
INTERSPEECH 2009: 10TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2009, VOLS 1-5, 2009, : 1947 - 1950
[27] Video multimodal emotion recognition based on Bi-GRU and attention fusion
Ruo-Hong Huan
Jia Shu
Sheng-Lin Bao
Rong-Hua Liang
Peng Chen
Kai-Kai Chi
Multimedia Tools and Applications, 2021, 80 : 8213 - 8240
[28] Emotion Recognition Using Fusion of Audio and Video Features
Ortega, Juan D. S.
Cardinal, Patrick
Koerich, Alessandro L.
2019 IEEE INTERNATIONAL CONFERENCE ON SYSTEMS, MAN AND CYBERNETICS (SMC), 2019, : 3847 - 3852
[29] Video Based Emotion Recognition Using CNN and BRNN
Cai, Youyi
Zheng, Wenming
Zhang, Tong
Li, Qiang
Cui, Zhen
Ye, Jiayin
PATTERN RECOGNITION (CCPR 2016), PT II, 2016, 663 : 679 - 691
[30] Multimodal emotion recognition using cross modal audio-video fusion with attention and deep metric learning
Mocanu, Bogdan
Tapu, Ruxandra
Zaharia, Titus
IMAGE AND VISION COMPUTING, 2023, 133

← 1 2 3 4 5 →