Emotion Recognition in Video Streams Using Intramodal and Intermodal Attention Mechanisms

被引:0
|
作者
Mocanu, Bogdan [2 ]
Tapu, Ruxandra [1 ,2 ]
机构
[1] Inst Polytech Paris, ARTEMIS Dept, Telecom SudParis, 9 Rue Charles Fourier, F-91000 Evry, France
[2] Univ Politehn Bucuresti, Fac ETTI, Dept Telecommun, Bucharest, Romania
关键词
Cross-modal emotion recognition; Self-attention; Spatial/channel and temporal attention; Audio and video fusion;
D O I
10.1007/978-3-031-20716-7_23
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
Automatic emotion recognition from video streams is an essential challenge for various applications including human behavior understanding, mental disease diagnosis, surveillance, or human-machine interaction. In this paper we introduce a novel, completely automatic, multimodal emotion recognition framework based on audio and visual fusion of information designed to leverage the mutually complementary nature of features while maintaining the modality-distinctive information. Specifically, we integrate the spatial, channel and temporal attention into the visual processing pipeline and the temporal self-attention into the audio branch. Then, a multimodal cross-attention fusion strategy is introduced that effectively exploits the relationship between the audio and video features. The experimental evaluation performed on RAVDESS, a publicly available database, validates the proposed approach with average accuracy scores superior to 87.85%. When compared with the state-of the art methods the proposed framework returns accuracy gains of more than 1.85%.
引用
收藏
页码:295 / 306
页数:12
相关论文
共 50 条
  • [21] Hierarchical Attention-Based Multimodal Fusion Network for Video Emotion Recognition
    Liu, Xiaodong
    Li, Songyang
    Wang, Miao
    COMPUTATIONAL INTELLIGENCE AND NEUROSCIENCE, 2021, 2021
  • [22] A CROSS-ATTENTION EMOTION RECOGNITION ALGORITHM BASED ON AUDIO AND VIDEO MODALITIES
    Wu, Xiao
    Mu, Xuan
    Qi, Wen
    Liu, Xiaorui
    2024 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING WORKSHOPS, ICASSPW 2024, 2024, : 309 - 313
  • [23] Ubiquitous Emotion Recognition Using Audio and Video Data
    Jannat, Rahatul
    Tynes, Iyonna
    LaLime, Lott
    Adorno, Juan
    Canavan, Shaun
    PROCEEDINGS OF THE 2018 ACM INTERNATIONAL JOINT CONFERENCE ON PERVASIVE AND UBIQUITOUS COMPUTING AND PROCEEDINGS OF THE 2018 ACM INTERNATIONAL SYMPOSIUM ON WEARABLE COMPUTERS (UBICOMP/ISWC'18 ADJUNCT), 2018, : 956 - 959
  • [24] Deep facial emotion recognition in video using eigenframes
    Hajarolasvadi, Noushin
    Demirel, Hasan
    IET IMAGE PROCESSING, 2020, 14 (14) : 3536 - 3546
  • [25] Video multimodal emotion recognition based on Bi-GRU and attention fusion
    Huan, Ruo-Hong
    Shu, Jia
    Bao, Sheng-Lin
    Liang, Rong-Hua
    Chen, Peng
    Chi, Kai-Kai
    MULTIMEDIA TOOLS AND APPLICATIONS, 2021, 80 (06) : 8213 - 8240
  • [26] Emotion Recognition using Linear Transformations in Combination with Video
    Gajsek, Rok
    Struc, Vitomir
    Dobrisek, Simon
    Mihelic, France
    INTERSPEECH 2009: 10TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2009, VOLS 1-5, 2009, : 1947 - 1950
  • [27] Video multimodal emotion recognition based on Bi-GRU and attention fusion
    Ruo-Hong Huan
    Jia Shu
    Sheng-Lin Bao
    Rong-Hua Liang
    Peng Chen
    Kai-Kai Chi
    Multimedia Tools and Applications, 2021, 80 : 8213 - 8240
  • [28] Emotion Recognition Using Fusion of Audio and Video Features
    Ortega, Juan D. S.
    Cardinal, Patrick
    Koerich, Alessandro L.
    2019 IEEE INTERNATIONAL CONFERENCE ON SYSTEMS, MAN AND CYBERNETICS (SMC), 2019, : 3847 - 3852
  • [29] Video Based Emotion Recognition Using CNN and BRNN
    Cai, Youyi
    Zheng, Wenming
    Zhang, Tong
    Li, Qiang
    Cui, Zhen
    Ye, Jiayin
    PATTERN RECOGNITION (CCPR 2016), PT II, 2016, 663 : 679 - 691
  • [30] Multimodal emotion recognition using cross modal audio-video fusion with attention and deep metric learning
    Mocanu, Bogdan
    Tapu, Ruxandra
    Zaharia, Titus
    IMAGE AND VISION COMPUTING, 2023, 133