EmoNets: Multimodal deep learning approaches for emotion recognition in video

被引：242

作者：

Kahou, Samira Ebrahimi ^{[1
]}

Bouthillier, Xavier ^{[3
]}

Lamblin, Pascal ^{[3
]}

Gulcehre, Caglar ^{[3
]}

Michalski, Vincent ^{[2
]}

Konda, Kishore ^{[2
]}

Jean, Sebastien ^{[3
]}

Froumenty, Pierre ^{[1
]}

Dauphin, Yann ^{[3
]}

Boulanger-Lewandowski, Nicolas ^{[3
]}

Ferrari, Raul Chandias ^{[3
]}

Mirza, Mehdi ^{[3
]}

Warde-Farley, David ^{[3
]}

Courville, Aaron ^{[3
]}

Vincent, Pascal ^{[3
]}

Memisevic, Roland ^{[3
]}

Pal, Christopher ^{[1
]}

Bengio, Yoshua ^{[3
]}

机构：

[1] Univ Montreal, Ecole Polytech Montreal, Montreal, PQ, Canada

[2] Goethe Univ Frankfurt, D-60054 Frankfurt, Germany

[3] Univ Montreal, Montreal Inst Learning Algorithms, Montreal, PQ, Canada

来源：

JOURNAL ON MULTIMODAL USER INTERFACES | 2016年 / 10卷 / 02期

基金：

加拿大自然科学与工程研究理事会;

关键词：

Emotion recognition; Deep learning; Model combination; Multimodal learning;

D O I：

10.1007/s12193-015-0195-2

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

The task of the Emotion Recognition in the Wild (EmotiW) Challenge is to assign one of seven emotions to short video clips extracted from Hollywood style movies. The videos depict acted-out emotions under realistic conditions with a large degree of variation in attributes such as pose and illumination, making it worthwhile to explore approaches which consider combinations of features from multiple modalities for label assignment. In this paper we present our approach to learning several specialist models using deep learning techniques, each focusing on one modality. Among these are a convolutional neural network, focusing on capturing visual information in detected faces, a deep belief net focusing on the representation of the audio stream, a K-Means based "bag-of-mouths" model, which extracts visual features around the mouth region and a relational autoencoder, which addresses spatio-temporal aspects of videos. We explore multiple methods for the combination of cues from these modalities into one common classifier. This achieves a considerably greater accuracy than predictions from our strongest single-modality classifier. Our method was the winning submission in the 2013 Emoti W challenge and achieved a test set accuracy of 47.67 % on the 2014 dataset.

引用

页码：99 / 111

页数：13

共 50 条

[41] Multimodal Emotion Recognition and State Analysis of Classroom Video and Audio Based on Deep Neural Network
Li, Mingyong
Liu, Mingyue
Jiang, Zheng
Zhao, Zongwei
Zhang, Jiayan
Ge, Mingyuan
Duan, Huiming
Wang, Yanxia
[J]. JOURNAL OF INTERCONNECTION NETWORKS, 2022, 22 (SUPP04)
[42] Interactive Robot Learning for Multimodal Emotion Recognition
Yu, Chuang
Tapus, Adriana
[J]. SOCIAL ROBOTICS, ICSR 2019, 2019, 11876 : 633 - 642
[43] Disentangled Representation Learning for Multimodal Emotion Recognition
Yang, Dingkang
Huang, Shuai
Kuang, Haopeng
Du, Yangtao
Zhang, Lihua
[J]. PROCEEDINGS OF THE 30TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2022, 2022, : 1642 - 1651
[44] Deep Learning Based Video Spatio-Temporal Modeling for Emotion Recognition
Fonnegra, Ruben D.
Diaz, Gloria M.
[J]. HUMAN-COMPUTER INTERACTION: THEORIES, METHODS, AND HUMAN ISSUES, HCI INTERNATIONAL 2018, PT I, 2018, 10901 : 397 - 408
[45] Video Emotion Recognition using Hand-Crafted and Deep Learning Features
Xia, Xiaohan
Liu, Jiamu
Yang, Tao
Jiang, Dongmei
Han, Wenjing
Sahli, Hichem
[J]. 2018 FIRST ASIAN CONFERENCE ON AFFECTIVE COMPUTING AND INTELLIGENT INTERACTION (ACII ASIA), 2018,
[46] Deep Learning Approaches for Human Activity Recognition in Video Surveillance - A Survey
Khurana, Rajat
Kushwaha, Alok Kumar Singh
[J]. 2018 FIRST INTERNATIONAL CONFERENCE ON SECURE CYBER COMPUTING AND COMMUNICATIONS (ICSCCC 2018), 2018, : 542 - 544
[47] A Survey of Deep Learning-Based Multimodal Emotion Recognition: Speech, Text, and Face
Lian, Hailun
Lu, Cheng
Li, Sunan
Zhao, Yan
Tang, Chuangao
Zong, Yuan
[J]. ENTROPY, 2023, 25 (10)
[48] Multimodal fusion: A study on speech-text emotion recognition with the integration of deep learning
Shang, Yanan
Fu, Tianqi
[J]. INTELLIGENT SYSTEMS WITH APPLICATIONS, 2024, 24
[49] Toward Mathematical Representation of Emotion: A Deep Multitask Learning Method Based On Multimodal Recognition
Harata, Seiichi
Sakuma, Takuto
Kato, Shohei
[J]. COMPANION PUBLICATON OF THE 2020 INTERNATIONAL CONFERENCE ON MULTIMODAL INTERACTION (ICMI '20 COMPANION), 2020, : 47 - 51
[50] Multimodal Emotion Recognition Using Deep Neural Networks
Tang, Hao
Liu, Wei
Zheng, Wei-Long
Lu, Bao-Liang
[J]. NEURAL INFORMATION PROCESSING (ICONIP 2017), PT IV, 2017, 10637 : 811 - 819

← 1 2 3 4 5 →