Fusion of deep learning features with mixture of brain emotional learning for audio-visual emotion recognition

被引:24
|
作者
Farhoudi, Zeinab [1 ]
Setayeshi, Saeed [2 ]
机构
[1] Islamic Azad Univ, Dept Comp Engn, Sci & Res Branch, Tehran, Iran
[2] Amirkabir Univ Technol, Dept Energy Engn & Phys, Tehran, Iran
关键词
Audio-Visual emotion recognition; Brain emotional learning; Deep learning; Convolutional neural networks; Mixture of network; Multimodal fusion; MODEL;
D O I
10.1016/j.specom.2020.12.001
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
Multimodal emotion recognition is a challenging task due to different modalities emotions expressed during a specific time in video clips. Considering the existed spatial-temporal correlation in the video, we propose an audio-visual fusion model of deep learning features with a Mixture of Brain Emotional Learning (MoBEL) model inspired by the brain limbic system. The proposed model is composed of two stages. First, deep learning methods, especially Convolutional Neural Network (CNN) and Recurrent Neural Network (RNN), are applied to represent highly abstract features. Second, the fusion model, namely MoBEL, is designed to learn the previously joined audio-visual features simultaneously. For the visual modality representation, the 3D-CNN model has been used to learn the spatial-temporal features of visual expression. On the other hand, for the auditory modality, the Mel-spectrograms of speech signals have been fed into CNN-RNN for the spatial-temporal feature extraction. The high-level feature fusion approach with the MoBEL network is presented to make use of a correlation between the visual and auditory modalities for improving the performance of emotion recognition. The experimental results on the eNterface'05 database have been demonstrated that the performance of the proposed method is better than the hand-crafted features and the other state-of-the-art information fusion models in video emotion recognition.
引用
收藏
页码:92 / 103
页数:12
相关论文
共 50 条
  • [1] Learning Affective Features With a Hybrid Deep Model for Audio-Visual Emotion Recognition
    Zhang, Shiqing
    Zhang, Shiliang
    Huang, Tiejun
    Gao, Wen
    Tian, Qi
    IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2018, 28 (10) : 3030 - 3043
  • [2] Deep Learning Based Audio-Visual Emotion Recognition in a Smart Learning Environment
    Ivleva, Natalja
    Pentel, Avar
    Dunajeva, Olga
    Justsenko, Valeria
    TOWARDS A HYBRID, FLEXIBLE AND SOCIALLY ENGAGED HIGHER EDUCATION, VOL 1, ICL 2023, 2024, 899 : 420 - 431
  • [3] Emotion recognition using deep learning approach from audio-visual emotional big data
    Hossain, M. Shamim
    Muhammad, Ghulam
    INFORMATION FUSION, 2019, 49 : 69 - 78
  • [4] Audio-Visual Learning for Multimodal Emotion Recognition
    Fan, Siyu
    Jing, Jianan
    Wang, Chongwen
    SYMMETRY-BASEL, 2025, 17 (03):
  • [5] Leveraging recent advances in deep learning for audio-Visual emotion recognition
    Schoneveld, Liam
    Othmani, Alice
    Abdelkawy, Hazem
    PATTERN RECOGNITION LETTERS, 2021, 146 : 1 - 7
  • [6] Deep Learning for Audio Visual Emotion Recognition
    Hussain, T.
    Wang, W.
    Bouaynaya, N.
    Fathallah-Shaykh, H.
    Mihaylova, L.
    2022 25TH INTERNATIONAL CONFERENCE ON INFORMATION FUSION (FUSION 2022), 2022,
  • [7] Deep learning based multimodal emotion recognition using model-level fusion of audio-visual modalities
    Middya, Asif Iqbal
    Nag, Baibhav
    Roy, Sarbani
    KNOWLEDGE-BASED SYSTEMS, 2022, 244
  • [8] Deep learning based multimodal emotion recognition using model-level fusion of audio-visual modalities
    Middya, Asif Iqbal
    Nag, Baibhav
    Roy, Sarbani
    KNOWLEDGE-BASED SYSTEMS, 2022, 244
  • [9] Joint low rank embedded multiple features learning for audio-visual emotion recognition
    Wang, Zhan
    Wang, Lizhi
    Huang, Hua
    NEUROCOMPUTING, 2020, 388 : 324 - 333
  • [10] An Active Learning Paradigm for Online Audio-Visual Emotion Recognition
    Kansizoglou, Ioannis
    Bampis, Loukas
    Gasteratos, Antonios
    IEEE TRANSACTIONS ON AFFECTIVE COMPUTING, 2022, 13 (02) : 756 - 768