Fusion of deep learning features with mixture of brain emotional learning for audio-visual emotion recognition

被引:24
|
作者
Farhoudi, Zeinab [1 ]
Setayeshi, Saeed [2 ]
机构
[1] Islamic Azad Univ, Dept Comp Engn, Sci & Res Branch, Tehran, Iran
[2] Amirkabir Univ Technol, Dept Energy Engn & Phys, Tehran, Iran
关键词
Audio-Visual emotion recognition; Brain emotional learning; Deep learning; Convolutional neural networks; Mixture of network; Multimodal fusion; MODEL;
D O I
10.1016/j.specom.2020.12.001
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
Multimodal emotion recognition is a challenging task due to different modalities emotions expressed during a specific time in video clips. Considering the existed spatial-temporal correlation in the video, we propose an audio-visual fusion model of deep learning features with a Mixture of Brain Emotional Learning (MoBEL) model inspired by the brain limbic system. The proposed model is composed of two stages. First, deep learning methods, especially Convolutional Neural Network (CNN) and Recurrent Neural Network (RNN), are applied to represent highly abstract features. Second, the fusion model, namely MoBEL, is designed to learn the previously joined audio-visual features simultaneously. For the visual modality representation, the 3D-CNN model has been used to learn the spatial-temporal features of visual expression. On the other hand, for the auditory modality, the Mel-spectrograms of speech signals have been fed into CNN-RNN for the spatial-temporal feature extraction. The high-level feature fusion approach with the MoBEL network is presented to make use of a correlation between the visual and auditory modalities for improving the performance of emotion recognition. The experimental results on the eNterface'05 database have been demonstrated that the performance of the proposed method is better than the hand-crafted features and the other state-of-the-art information fusion models in video emotion recognition.
引用
收藏
页码:92 / 103
页数:12
相关论文
共 50 条
  • [31] Audio-visual spontaneous emotion recognition
    Zeng, Zhihong
    Hu, Yuxiao
    Roisman, Glenn I.
    Wen, Zhen
    Fu, Yun
    Huang, Thomas S.
    ARTIFICIAL INTELLIGENCE FOR HUMAN COMPUTING, 2007, 4451 : 72 - +
  • [32] Empirical Study of Audio-Visual Features Fusion for Gait Recognition
    Castro, Francisco M.
    Marin-Jimenez, Manuel J.
    Guil, Nicolas
    COMPUTER ANALYSIS OF IMAGES AND PATTERNS, CAIP 2015, PT I, 2015, 9256 : 727 - 739
  • [33] From Sound to Sight: Audio-Visual Fusion and Deep Learning for Drone Detection
    Alla, Ildi
    Olou, Herve B.
    Loscri, Valeria
    Levorato, Marco
    PROCEEDINGS OF THE 17TH ACM CONFERENCE ON SECURITY AND PRIVACY IN WIRELESS AND MOBILE NETWORKS, WISEC 2024, 2024, : 123 - 133
  • [34] Audio-Visual Domain Adaptation Feature Fusion for Speech Emotion Recognition
    Wei, Jie
    Hu, Guanyu
    Yang, Xinyu
    Luu, Anh Tuan
    Dong, Yizhuo
    INTERSPEECH 2022, 2022, : 1988 - 1992
  • [35] A Combined Rule-Based & Machine Learning Audio-Visual Emotion Recognition Approach
    Seng, Kah Phooi
    Ang, Li-Minn
    Ooi, Chien Shing
    IEEE TRANSACTIONS ON AFFECTIVE COMPUTING, 2018, 9 (01) : 3 - 13
  • [36] Audio-Visual Fusion Network Based on Conformer for Multimodal Emotion Recognition
    Guo, Peini
    Chen, Zhengyan
    Li, Yidi
    Liu, Hong
    ARTIFICIAL INTELLIGENCE, CICAI 2022, PT II, 2022, 13605 : 315 - 326
  • [37] Deep Learning-Based Audio-Visual Speech Recognition for Bosnian Digits
    Fazlic, Husein
    Abd Almisre, Ali
    Tahir, Nooritawati Md
    JURNAL KEJURUTERAAN, 2024, 36 (01): : 147 - 154
  • [38] Open-Domain Audio-Visual Speech Recognition: A Deep Learning Approach
    Miao, Yajie
    Metze, Florian
    17TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2016), VOLS 1-5: UNDERSTANDING SPEECH PROCESSING IN HUMANS AND MACHINES, 2016, : 3414 - 3418
  • [39] A Review of Recent Advances on Deep Learning Methods for Audio-Visual Speech Recognition
    Ivanko, Denis
    Ryumin, Dmitry
    Karpov, Alexey
    MATHEMATICS, 2023, 11 (12)
  • [40] Deep Learning and Audio Based Emotion Recognition
    Demir, Asli
    Atila, Orhan
    Sengur, Abdulkadir
    2019 INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND DATA PROCESSING (IDAP 2019), 2019,