Emotion Recognition in Video Streams Using Intramodal and Intermodal Attention Mechanisms

被引:0
|
作者
Mocanu, Bogdan [2 ]
Tapu, Ruxandra [1 ,2 ]
机构
[1] Inst Polytech Paris, ARTEMIS Dept, Telecom SudParis, 9 Rue Charles Fourier, F-91000 Evry, France
[2] Univ Politehn Bucuresti, Fac ETTI, Dept Telecommun, Bucharest, Romania
关键词
Cross-modal emotion recognition; Self-attention; Spatial/channel and temporal attention; Audio and video fusion;
D O I
10.1007/978-3-031-20716-7_23
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
Automatic emotion recognition from video streams is an essential challenge for various applications including human behavior understanding, mental disease diagnosis, surveillance, or human-machine interaction. In this paper we introduce a novel, completely automatic, multimodal emotion recognition framework based on audio and visual fusion of information designed to leverage the mutually complementary nature of features while maintaining the modality-distinctive information. Specifically, we integrate the spatial, channel and temporal attention into the visual processing pipeline and the temporal self-attention into the audio branch. Then, a multimodal cross-attention fusion strategy is introduced that effectively exploits the relationship between the audio and video features. The experimental evaluation performed on RAVDESS, a publicly available database, validates the proposed approach with average accuracy scores superior to 87.85%. When compared with the state-of the art methods the proposed framework returns accuracy gains of more than 1.85%.
引用
收藏
页码:295 / 306
页数:12
相关论文
共 50 条
  • [11] Multimodal Attention Network for Continuous-Time Emotion Recognition Using Video and EEG Signals
    Choi, Dong Yoon
    Kim, Deok-Hwan
    Song, Byung Cheol
    IEEE ACCESS, 2020, 8 : 203814 - 203826
  • [12] Facial emotion recognition on video using deep attention based bidirectional LSTM with equilibrium optimizer
    Vedantham, Ramachandran
    Reddy, Edara Sreenivasa
    MULTIMEDIA TOOLS AND APPLICATIONS, 2023, 82 (19) : 28681 - 28711
  • [13] Facial emotion recognition on video using deep attention based bidirectional LSTM with equilibrium optimizer
    Ramachandran Vedantham
    Edara Sreenivasa Reddy
    Multimedia Tools and Applications, 2023, 82 : 28681 - 28711
  • [14] Context-Aware Attention Network for Human Emotion Recognition in Video
    Liu, Xiaodong
    Wang, Miao
    ADVANCES IN MULTIMEDIA, 2020, 2020
  • [15] Audio-Video Fusion with Double Attention for Multimodal Emotion Recognition
    Mocanu, Bogdan
    Tapu, Ruxandra
    2022 IEEE 14TH IMAGE, VIDEO, AND MULTIDIMENSIONAL SIGNAL PROCESSING WORKSHOP (IVMSP), 2022,
  • [16] Workout Action Recognition in Video Streams Using an Attention Driven Residual DC-GRU Network
    Dey, Arnab
    Biswas, Samit
    Le, Dac-Nhuong
    CMC-COMPUTERS MATERIALS & CONTINUA, 2024, 79 (02): : 3067 - 3087
  • [17] Intermodal attention modulates visual processing in dorsal and ventral streams
    Cate, A. D.
    Herron, T. J.
    Kang, X.
    Yund, E. W.
    Woods, D. L.
    NEUROIMAGE, 2012, 63 (03) : 1295 - 1304
  • [18] Lightweight attention mechanisms for EEG emotion recognition for brain computer interface
    Gunda, Naresh Kumar
    Khalaf, Mohammed I.
    Bhatnagar, Shaleen
    Quraishi, Aadam
    Gudala, Leeladhar
    Venkata, Ashok Kumar Pamidi
    Alghayadh, Faisal Yousef
    Alsubai, Shtwai
    Bhatnagar, Vaibhav
    JOURNAL OF NEUROSCIENCE METHODS, 2024, 410
  • [19] Improving Speech Emotion Recognition Through Focus and Calibration Attention Mechanisms
    Kim, Junghun
    An, Yoojin
    Kim, Jihie
    INTERSPEECH 2022, 2022, : 136 - 140
  • [20] Multi-Attention Fusion Network for Video-based Emotion Recognition
    Wang, Yanan
    Wu, Jianming
    Hoashi, Keiichiro
    ICMI'19: PROCEEDINGS OF THE 2019 INTERNATIONAL CONFERENCE ON MULTIMODAL INTERACTION, 2019, : 595 - 601