EmoNets: Multimodal deep learning approaches for emotion recognition in video

被引:242
|
作者
Kahou, Samira Ebrahimi [1 ]
Bouthillier, Xavier [3 ]
Lamblin, Pascal [3 ]
Gulcehre, Caglar [3 ]
Michalski, Vincent [2 ]
Konda, Kishore [2 ]
Jean, Sebastien [3 ]
Froumenty, Pierre [1 ]
Dauphin, Yann [3 ]
Boulanger-Lewandowski, Nicolas [3 ]
Ferrari, Raul Chandias [3 ]
Mirza, Mehdi [3 ]
Warde-Farley, David [3 ]
Courville, Aaron [3 ]
Vincent, Pascal [3 ]
Memisevic, Roland [3 ]
Pal, Christopher [1 ]
Bengio, Yoshua [3 ]
机构
[1] Univ Montreal, Ecole Polytech Montreal, Montreal, PQ, Canada
[2] Goethe Univ Frankfurt, D-60054 Frankfurt, Germany
[3] Univ Montreal, Montreal Inst Learning Algorithms, Montreal, PQ, Canada
基金
加拿大自然科学与工程研究理事会;
关键词
Emotion recognition; Deep learning; Model combination; Multimodal learning;
D O I
10.1007/s12193-015-0195-2
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The task of the Emotion Recognition in the Wild (EmotiW) Challenge is to assign one of seven emotions to short video clips extracted from Hollywood style movies. The videos depict acted-out emotions under realistic conditions with a large degree of variation in attributes such as pose and illumination, making it worthwhile to explore approaches which consider combinations of features from multiple modalities for label assignment. In this paper we present our approach to learning several specialist models using deep learning techniques, each focusing on one modality. Among these are a convolutional neural network, focusing on capturing visual information in detected faces, a deep belief net focusing on the representation of the audio stream, a K-Means based "bag-of-mouths" model, which extracts visual features around the mouth region and a relational autoencoder, which addresses spatio-temporal aspects of videos. We explore multiple methods for the combination of cues from these modalities into one common classifier. This achieves a considerably greater accuracy than predictions from our strongest single-modality classifier. Our method was the winning submission in the 2013 Emoti W challenge and achieved a test set accuracy of 47.67 % on the 2014 dataset.
引用
收藏
页码:99 / 111
页数:13
相关论文
共 50 条
  • [41] Multimodal Emotion Recognition and State Analysis of Classroom Video and Audio Based on Deep Neural Network
    Li, Mingyong
    Liu, Mingyue
    Jiang, Zheng
    Zhao, Zongwei
    Zhang, Jiayan
    Ge, Mingyuan
    Duan, Huiming
    Wang, Yanxia
    [J]. JOURNAL OF INTERCONNECTION NETWORKS, 2022, 22 (SUPP04)
  • [42] Interactive Robot Learning for Multimodal Emotion Recognition
    Yu, Chuang
    Tapus, Adriana
    [J]. SOCIAL ROBOTICS, ICSR 2019, 2019, 11876 : 633 - 642
  • [43] Disentangled Representation Learning for Multimodal Emotion Recognition
    Yang, Dingkang
    Huang, Shuai
    Kuang, Haopeng
    Du, Yangtao
    Zhang, Lihua
    [J]. PROCEEDINGS OF THE 30TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2022, 2022, : 1642 - 1651
  • [44] Deep Learning Based Video Spatio-Temporal Modeling for Emotion Recognition
    Fonnegra, Ruben D.
    Diaz, Gloria M.
    [J]. HUMAN-COMPUTER INTERACTION: THEORIES, METHODS, AND HUMAN ISSUES, HCI INTERNATIONAL 2018, PT I, 2018, 10901 : 397 - 408
  • [45] Video Emotion Recognition using Hand-Crafted and Deep Learning Features
    Xia, Xiaohan
    Liu, Jiamu
    Yang, Tao
    Jiang, Dongmei
    Han, Wenjing
    Sahli, Hichem
    [J]. 2018 FIRST ASIAN CONFERENCE ON AFFECTIVE COMPUTING AND INTELLIGENT INTERACTION (ACII ASIA), 2018,
  • [46] Deep Learning Approaches for Human Activity Recognition in Video Surveillance - A Survey
    Khurana, Rajat
    Kushwaha, Alok Kumar Singh
    [J]. 2018 FIRST INTERNATIONAL CONFERENCE ON SECURE CYBER COMPUTING AND COMMUNICATIONS (ICSCCC 2018), 2018, : 542 - 544
  • [47] A Survey of Deep Learning-Based Multimodal Emotion Recognition: Speech, Text, and Face
    Lian, Hailun
    Lu, Cheng
    Li, Sunan
    Zhao, Yan
    Tang, Chuangao
    Zong, Yuan
    [J]. ENTROPY, 2023, 25 (10)
  • [48] Multimodal fusion: A study on speech-text emotion recognition with the integration of deep learning
    Shang, Yanan
    Fu, Tianqi
    [J]. INTELLIGENT SYSTEMS WITH APPLICATIONS, 2024, 24
  • [49] Toward Mathematical Representation of Emotion: A Deep Multitask Learning Method Based On Multimodal Recognition
    Harata, Seiichi
    Sakuma, Takuto
    Kato, Shohei
    [J]. COMPANION PUBLICATON OF THE 2020 INTERNATIONAL CONFERENCE ON MULTIMODAL INTERACTION (ICMI '20 COMPANION), 2020, : 47 - 51
  • [50] Multimodal Emotion Recognition Using Deep Neural Networks
    Tang, Hao
    Liu, Wei
    Zheng, Wei-Long
    Lu, Bao-Liang
    [J]. NEURAL INFORMATION PROCESSING (ICONIP 2017), PT IV, 2017, 10637 : 811 - 819