Fusion of deep learning features with mixture of brain emotional learning for audio-visual emotion recognition

被引：24

作者：

Farhoudi, Zeinab ^{[1
]}

Setayeshi, Saeed ^{[2
]}

机构：

[1] Islamic Azad Univ, Dept Comp Engn, Sci & Res Branch, Tehran, Iran

[2] Amirkabir Univ Technol, Dept Energy Engn & Phys, Tehran, Iran

来源：

SPEECH COMMUNICATION | 2021年 / 127卷

关键词：

Audio-Visual emotion recognition; Brain emotional learning; Deep learning; Convolutional neural networks; Mixture of network; Multimodal fusion; MODEL;

D O I：

10.1016/j.specom.2020.12.001

中图分类号：

O42 [声学];

学科分类号：

070206 ; 082403 ;

摘要：

Multimodal emotion recognition is a challenging task due to different modalities emotions expressed during a specific time in video clips. Considering the existed spatial-temporal correlation in the video, we propose an audio-visual fusion model of deep learning features with a Mixture of Brain Emotional Learning (MoBEL) model inspired by the brain limbic system. The proposed model is composed of two stages. First, deep learning methods, especially Convolutional Neural Network (CNN) and Recurrent Neural Network (RNN), are applied to represent highly abstract features. Second, the fusion model, namely MoBEL, is designed to learn the previously joined audio-visual features simultaneously. For the visual modality representation, the 3D-CNN model has been used to learn the spatial-temporal features of visual expression. On the other hand, for the auditory modality, the Mel-spectrograms of speech signals have been fed into CNN-RNN for the spatial-temporal feature extraction. The high-level feature fusion approach with the MoBEL network is presented to make use of a correlation between the visual and auditory modalities for improving the performance of emotion recognition. The experimental results on the eNterface'05 database have been demonstrated that the performance of the proposed method is better than the hand-crafted features and the other state-of-the-art information fusion models in video emotion recognition.

引用

页码：92 / 103

页数：12

共 50 条

[31] Audio-visual spontaneous emotion recognition
Zeng, Zhihong
Hu, Yuxiao
Roisman, Glenn I.
Wen, Zhen
Fu, Yun
Huang, Thomas S.
ARTIFICIAL INTELLIGENCE FOR HUMAN COMPUTING, 2007, 4451 : 72 - +
[32] Empirical Study of Audio-Visual Features Fusion for Gait Recognition
Castro, Francisco M.
Marin-Jimenez, Manuel J.
Guil, Nicolas
COMPUTER ANALYSIS OF IMAGES AND PATTERNS, CAIP 2015, PT I, 2015, 9256 : 727 - 739
[33] From Sound to Sight: Audio-Visual Fusion and Deep Learning for Drone Detection
Alla, Ildi
Olou, Herve B.
Loscri, Valeria
Levorato, Marco
PROCEEDINGS OF THE 17TH ACM CONFERENCE ON SECURITY AND PRIVACY IN WIRELESS AND MOBILE NETWORKS, WISEC 2024, 2024, : 123 - 133
[34] Audio-Visual Domain Adaptation Feature Fusion for Speech Emotion Recognition
Wei, Jie
Hu, Guanyu
Yang, Xinyu
Luu, Anh Tuan
Dong, Yizhuo
INTERSPEECH 2022, 2022, : 1988 - 1992
[35] A Combined Rule-Based & Machine Learning Audio-Visual Emotion Recognition Approach
Seng, Kah Phooi
Ang, Li-Minn
Ooi, Chien Shing
IEEE TRANSACTIONS ON AFFECTIVE COMPUTING, 2018, 9 (01) : 3 - 13
[36] Audio-Visual Fusion Network Based on Conformer for Multimodal Emotion Recognition
Guo, Peini
Chen, Zhengyan
Li, Yidi
Liu, Hong
ARTIFICIAL INTELLIGENCE, CICAI 2022, PT II, 2022, 13605 : 315 - 326
[37] Deep Learning-Based Audio-Visual Speech Recognition for Bosnian Digits
Fazlic, Husein
Abd Almisre, Ali
Tahir, Nooritawati Md
JURNAL KEJURUTERAAN, 2024, 36 (01): : 147 - 154
[38] Open-Domain Audio-Visual Speech Recognition: A Deep Learning Approach
Miao, Yajie
Metze, Florian
17TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2016), VOLS 1-5: UNDERSTANDING SPEECH PROCESSING IN HUMANS AND MACHINES, 2016, : 3414 - 3418
[39] A Review of Recent Advances on Deep Learning Methods for Audio-Visual Speech Recognition
Ivanko, Denis
Ryumin, Dmitry
Karpov, Alexey
MATHEMATICS, 2023, 11 (12)
[40] Deep Learning and Audio Based Emotion Recognition
Demir, Asli
Atila, Orhan
Sengur, Abdulkadir
2019 INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND DATA PROCESSING (IDAP 2019), 2019,

← 1 2 3 4 5 →