Emotion Recognition in Video Streams Using Intramodal and Intermodal Attention Mechanisms

被引:0
|
作者
Mocanu, Bogdan [2 ]
Tapu, Ruxandra [1 ,2 ]
机构
[1] Inst Polytech Paris, ARTEMIS Dept, Telecom SudParis, 9 Rue Charles Fourier, F-91000 Evry, France
[2] Univ Politehn Bucuresti, Fac ETTI, Dept Telecommun, Bucharest, Romania
关键词
Cross-modal emotion recognition; Self-attention; Spatial/channel and temporal attention; Audio and video fusion;
D O I
10.1007/978-3-031-20716-7_23
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
Automatic emotion recognition from video streams is an essential challenge for various applications including human behavior understanding, mental disease diagnosis, surveillance, or human-machine interaction. In this paper we introduce a novel, completely automatic, multimodal emotion recognition framework based on audio and visual fusion of information designed to leverage the mutually complementary nature of features while maintaining the modality-distinctive information. Specifically, we integrate the spatial, channel and temporal attention into the visual processing pipeline and the temporal self-attention into the audio branch. Then, a multimodal cross-attention fusion strategy is introduced that effectively exploits the relationship between the audio and video features. The experimental evaluation performed on RAVDESS, a publicly available database, validates the proposed approach with average accuracy scores superior to 87.85%. When compared with the state-of the art methods the proposed framework returns accuracy gains of more than 1.85%.
引用
收藏
页码:295 / 306
页数:12
相关论文
共 50 条
  • [1] EFFECTS OF INTERMODAL AND INTRAMODAL SELECTIVE ATTENTION TO NONSPATIAL VISUAL-STIMULI
    DERUITER, M
    VAN DER SCHOOT, M
    KOK, A
    PSYCHOPHYSIOLOGY, 1995, 32 : S66 - S66
  • [2] The Impact of Attention Mechanisms on Speech Emotion Recognition
    Chen, Shouyan
    Zhang, Mingyan
    Yang, Xiaofen
    Zhao, Zhijia
    Zou, Tao
    Sun, Xinqi
    SENSORS, 2021, 21 (22)
  • [3] DUAL FOCUS ATTENTION NETWORK FOR VIDEO EMOTION RECOGNITION
    Qiu, Haonan
    He, Liang
    Wang, Feng
    2020 IEEE INTERNATIONAL CONFERENCE ON MULTIMEDIA AND EXPO (ICME), 2020,
  • [4] INTRAMODAL AND INTERMODAL WORD-RECOGNITION CUES - REASSURING METHODOLOGICAL NOTE
    SWENSON, I
    FRY, MA
    PERCEPTUAL AND MOTOR SKILLS, 1975, 41 (02) : 603 - 606
  • [5] Video Emotion Recognition Based on Hierarchical Attention Model
    Wang X.
    Pan L.
    Peng M.
    Hu M.
    Jin C.
    Ren F.
    Jisuanji Fuzhu Sheji Yu Tuxingxue Xuebao/Journal of Computer-Aided Design and Computer Graphics, 2020, 32 (01): : 27 - 35
  • [6] DIFFERENTIAL-EFFECTS OF INTERMODAL AND INTRAMODAL ATTENTION TASKS ON IGNORE MISMATCH NEGATIVITY (MMN)
    PUNTER, S
    MICHIE, PT
    SOLOWIJ, N
    HALLER, M
    BIOLOGICAL PSYCHOLOGY, 1995, 39 (2-3) : 198 - 198
  • [7] Region Dual Attention-Based Video Emotion Recognition
    Liu, Xiaodong
    Xu, Huating
    Wang, Miao
    COMPUTATIONAL INTELLIGENCE AND NEUROSCIENCE, 2022, 2022
  • [8] Conversational Emotion Recognition Using Self-Attention Mechanisms and Graph Neural Networks
    Lian, Zheng
    Tao, Jianhua
    Liu, Bin
    Huang, Jian
    Yang, Zhanlei
    Li, Rongjun
    INTERSPEECH 2020, 2020, : 2347 - 2351
  • [9] Joint modelling of audio-visual cues using attention mechanisms for emotion recognition
    Esam Ghaleb
    Jan Niehues
    Stylianos Asteriadis
    Multimedia Tools and Applications, 2023, 82 : 11239 - 11264
  • [10] Joint modelling of audio-visual cues using attention mechanisms for emotion recognition
    Ghaleb, Esam
    Niehues, Jan
    Asteriadis, Stylianos
    MULTIMEDIA TOOLS AND APPLICATIONS, 2023, 82 (08) : 11239 - 11264