Emotion Recognition in Video Streams Using Intramodal and Intermodal Attention Mechanisms

被引：0

作者：

Mocanu, Bogdan ^{[2
]}

Tapu, Ruxandra ^{[1
,2
]}

机构：

[1] Inst Polytech Paris, ARTEMIS Dept, Telecom SudParis, 9 Rue Charles Fourier, F-91000 Evry, France

[2] Univ Politehn Bucuresti, Fac ETTI, Dept Telecommun, Bucharest, Romania

来源：

ADVANCES IN VISUAL COMPUTING, ISVC 2022, PT II | 2022年 / 13599卷

关键词：

Cross-modal emotion recognition; Self-attention; Spatial/channel and temporal attention; Audio and video fusion;

D O I：

10.1007/978-3-031-20716-7_23

中图分类号：

TP31 [计算机软件];

学科分类号：

081202 ; 0835 ;

摘要：

Automatic emotion recognition from video streams is an essential challenge for various applications including human behavior understanding, mental disease diagnosis, surveillance, or human-machine interaction. In this paper we introduce a novel, completely automatic, multimodal emotion recognition framework based on audio and visual fusion of information designed to leverage the mutually complementary nature of features while maintaining the modality-distinctive information. Specifically, we integrate the spatial, channel and temporal attention into the visual processing pipeline and the temporal self-attention into the audio branch. Then, a multimodal cross-attention fusion strategy is introduced that effectively exploits the relationship between the audio and video features. The experimental evaluation performed on RAVDESS, a publicly available database, validates the proposed approach with average accuracy scores superior to 87.85%. When compared with the state-of the art methods the proposed framework returns accuracy gains of more than 1.85%.

引用

页码：295 / 306

页数：12

共 50 条

[1] EFFECTS OF INTERMODAL AND INTRAMODAL SELECTIVE ATTENTION TO NONSPATIAL VISUAL-STIMULI
DERUITER, M
VAN DER SCHOOT, M
KOK, A
PSYCHOPHYSIOLOGY, 1995, 32 : S66 - S66
[2] The Impact of Attention Mechanisms on Speech Emotion Recognition
Chen, Shouyan
Zhang, Mingyan
Yang, Xiaofen
Zhao, Zhijia
Zou, Tao
Sun, Xinqi
SENSORS, 2021, 21 (22)
[3] DUAL FOCUS ATTENTION NETWORK FOR VIDEO EMOTION RECOGNITION
Qiu, Haonan
He, Liang
Wang, Feng
2020 IEEE INTERNATIONAL CONFERENCE ON MULTIMEDIA AND EXPO (ICME), 2020,
[4] INTRAMODAL AND INTERMODAL WORD-RECOGNITION CUES - REASSURING METHODOLOGICAL NOTE
SWENSON, I
FRY, MA
PERCEPTUAL AND MOTOR SKILLS, 1975, 41 (02) : 603 - 606
[5] Video Emotion Recognition Based on Hierarchical Attention Model
Wang X.
Pan L.
Peng M.
Hu M.
Jin C.
Ren F.
Jisuanji Fuzhu Sheji Yu Tuxingxue Xuebao/Journal of Computer-Aided Design and Computer Graphics, 2020, 32 (01): : 27 - 35
[6] DIFFERENTIAL-EFFECTS OF INTERMODAL AND INTRAMODAL ATTENTION TASKS ON IGNORE MISMATCH NEGATIVITY (MMN)
PUNTER, S
MICHIE, PT
SOLOWIJ, N
HALLER, M
BIOLOGICAL PSYCHOLOGY, 1995, 39 (2-3) : 198 - 198
[7] Region Dual Attention-Based Video Emotion Recognition
Liu, Xiaodong
Xu, Huating
Wang, Miao
COMPUTATIONAL INTELLIGENCE AND NEUROSCIENCE, 2022, 2022
[8] Conversational Emotion Recognition Using Self-Attention Mechanisms and Graph Neural Networks
Lian, Zheng
Tao, Jianhua
Liu, Bin
Huang, Jian
Yang, Zhanlei
Li, Rongjun
INTERSPEECH 2020, 2020, : 2347 - 2351
[9] Joint modelling of audio-visual cues using attention mechanisms for emotion recognition
Esam Ghaleb
Jan Niehues
Stylianos Asteriadis
Multimedia Tools and Applications, 2023, 82 : 11239 - 11264
[10] Joint modelling of audio-visual cues using attention mechanisms for emotion recognition
Ghaleb, Esam
Niehues, Jan
Asteriadis, Stylianos
MULTIMEDIA TOOLS AND APPLICATIONS, 2023, 82 (08) : 11239 - 11264

← 1 2 3 4 5 →