Emotion Recognition in Video Streams Using Intramodal and Intermodal Attention Mechanisms

被引：0

作者：

Mocanu, Bogdan ^{[2
]}

Tapu, Ruxandra ^{[1
,2
]}

机构：

[1] Inst Polytech Paris, ARTEMIS Dept, Telecom SudParis, 9 Rue Charles Fourier, F-91000 Evry, France

[2] Univ Politehn Bucuresti, Fac ETTI, Dept Telecommun, Bucharest, Romania

来源：

ADVANCES IN VISUAL COMPUTING, ISVC 2022, PT II | 2022年 / 13599卷

关键词：

Cross-modal emotion recognition; Self-attention; Spatial/channel and temporal attention; Audio and video fusion;

D O I：

10.1007/978-3-031-20716-7_23

中图分类号：

TP31 [计算机软件];

学科分类号：

081202 ; 0835 ;

摘要：

Automatic emotion recognition from video streams is an essential challenge for various applications including human behavior understanding, mental disease diagnosis, surveillance, or human-machine interaction. In this paper we introduce a novel, completely automatic, multimodal emotion recognition framework based on audio and visual fusion of information designed to leverage the mutually complementary nature of features while maintaining the modality-distinctive information. Specifically, we integrate the spatial, channel and temporal attention into the visual processing pipeline and the temporal self-attention into the audio branch. Then, a multimodal cross-attention fusion strategy is introduced that effectively exploits the relationship between the audio and video features. The experimental evaluation performed on RAVDESS, a publicly available database, validates the proposed approach with average accuracy scores superior to 87.85%. When compared with the state-of the art methods the proposed framework returns accuracy gains of more than 1.85%.

引用

页码：295 / 306

页数：12

共 50 条

[11] Multimodal Attention Network for Continuous-Time Emotion Recognition Using Video and EEG Signals
Choi, Dong Yoon
Kim, Deok-Hwan
Song, Byung Cheol
IEEE ACCESS, 2020, 8 : 203814 - 203826
[12] Facial emotion recognition on video using deep attention based bidirectional LSTM with equilibrium optimizer
Vedantham, Ramachandran
Reddy, Edara Sreenivasa
MULTIMEDIA TOOLS AND APPLICATIONS, 2023, 82 (19) : 28681 - 28711
[13] Facial emotion recognition on video using deep attention based bidirectional LSTM with equilibrium optimizer
Ramachandran Vedantham
Edara Sreenivasa Reddy
Multimedia Tools and Applications, 2023, 82 : 28681 - 28711
[14] Context-Aware Attention Network for Human Emotion Recognition in Video
Liu, Xiaodong
Wang, Miao
ADVANCES IN MULTIMEDIA, 2020, 2020
[15] Audio-Video Fusion with Double Attention for Multimodal Emotion Recognition
Mocanu, Bogdan
Tapu, Ruxandra
2022 IEEE 14TH IMAGE, VIDEO, AND MULTIDIMENSIONAL SIGNAL PROCESSING WORKSHOP (IVMSP), 2022,
[16] Workout Action Recognition in Video Streams Using an Attention Driven Residual DC-GRU Network
Dey, Arnab
Biswas, Samit
Le, Dac-Nhuong
CMC-COMPUTERS MATERIALS & CONTINUA, 2024, 79 (02): : 3067 - 3087
[17] Intermodal attention modulates visual processing in dorsal and ventral streams
Cate, A. D.
Herron, T. J.
Kang, X.
Yund, E. W.
Woods, D. L.
NEUROIMAGE, 2012, 63 (03) : 1295 - 1304
[18] Lightweight attention mechanisms for EEG emotion recognition for brain computer interface
Gunda, Naresh Kumar
Khalaf, Mohammed I.
Bhatnagar, Shaleen
Quraishi, Aadam
Gudala, Leeladhar
Venkata, Ashok Kumar Pamidi
Alghayadh, Faisal Yousef
Alsubai, Shtwai
Bhatnagar, Vaibhav
JOURNAL OF NEUROSCIENCE METHODS, 2024, 410
[19] Improving Speech Emotion Recognition Through Focus and Calibration Attention Mechanisms
Kim, Junghun
An, Yoojin
Kim, Jihie
INTERSPEECH 2022, 2022, : 136 - 140
[20] Multi-Attention Fusion Network for Video-based Emotion Recognition
Wang, Yanan
Wu, Jianming
Hoashi, Keiichiro
ICMI'19: PROCEEDINGS OF THE 2019 INTERNATIONAL CONFERENCE ON MULTIMODAL INTERACTION, 2019, : 595 - 601

← 1 2 3 4 5 →