EmoNets: Multimodal deep learning approaches for emotion recognition in video

被引:243
|
作者
Kahou, Samira Ebrahimi [1 ]
Bouthillier, Xavier [3 ]
Lamblin, Pascal [3 ]
Gulcehre, Caglar [3 ]
Michalski, Vincent [2 ]
Konda, Kishore [2 ]
Jean, Sebastien [3 ]
Froumenty, Pierre [1 ]
Dauphin, Yann [3 ]
Boulanger-Lewandowski, Nicolas [3 ]
Ferrari, Raul Chandias [3 ]
Mirza, Mehdi [3 ]
Warde-Farley, David [3 ]
Courville, Aaron [3 ]
Vincent, Pascal [3 ]
Memisevic, Roland [3 ]
Pal, Christopher [1 ]
Bengio, Yoshua [3 ]
机构
[1] Univ Montreal, Ecole Polytech Montreal, Montreal, PQ, Canada
[2] Goethe Univ Frankfurt, D-60054 Frankfurt, Germany
[3] Univ Montreal, Montreal Inst Learning Algorithms, Montreal, PQ, Canada
基金
加拿大自然科学与工程研究理事会;
关键词
Emotion recognition; Deep learning; Model combination; Multimodal learning;
D O I
10.1007/s12193-015-0195-2
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The task of the Emotion Recognition in the Wild (EmotiW) Challenge is to assign one of seven emotions to short video clips extracted from Hollywood style movies. The videos depict acted-out emotions under realistic conditions with a large degree of variation in attributes such as pose and illumination, making it worthwhile to explore approaches which consider combinations of features from multiple modalities for label assignment. In this paper we present our approach to learning several specialist models using deep learning techniques, each focusing on one modality. Among these are a convolutional neural network, focusing on capturing visual information in detected faces, a deep belief net focusing on the representation of the audio stream, a K-Means based "bag-of-mouths" model, which extracts visual features around the mouth region and a relational autoencoder, which addresses spatio-temporal aspects of videos. We explore multiple methods for the combination of cues from these modalities into one common classifier. This achieves a considerably greater accuracy than predictions from our strongest single-modality classifier. Our method was the winning submission in the 2013 Emoti W challenge and achieved a test set accuracy of 47.67 % on the 2014 dataset.
引用
收藏
页码:99 / 111
页数:13
相关论文
共 50 条
  • [1] EmoNets: Multimodal deep learning approaches for emotion recognition in video
    Samira Ebrahimi Kahou
    Xavier Bouthillier
    Pascal Lamblin
    Caglar Gulcehre
    Vincent Michalski
    Kishore Konda
    Sébastien Jean
    Pierre Froumenty
    Yann Dauphin
    Nicolas Boulanger-Lewandowski
    Raul Chandias Ferrari
    Mehdi Mirza
    David Warde-Farley
    Aaron Courville
    Pascal Vincent
    Roland Memisevic
    Christopher Pal
    Yoshua Bengio
    [J]. Journal on Multimodal User Interfaces, 2016, 10 : 99 - 111
  • [2] Chinese Multimodal Emotion Recognition in Deep and Traditional Machine Learning Approaches
    Miao, Haotian
    Zhang, Yifei
    Li, Weipeng
    Zhang, Haoran
    Wang, Daling
    Feng, Shi
    [J]. 2018 FIRST ASIAN CONFERENCE ON AFFECTIVE COMPUTING AND INTELLIGENT INTERACTION (ACII ASIA), 2018,
  • [3] Emotion recognition using multimodal deep learning in multiple psychophysiological signals and video
    Wang, Zhongmin
    Zhou, Xiaoxiao
    Wang, Wenlang
    Liang, Chen
    [J]. INTERNATIONAL JOURNAL OF MACHINE LEARNING AND CYBERNETICS, 2020, 11 (04) : 923 - 934
  • [4] Emotion recognition using multimodal deep learning in multiple psychophysiological signals and video
    Zhongmin Wang
    Xiaoxiao Zhou
    Wenlang Wang
    Chen Liang
    [J]. International Journal of Machine Learning and Cybernetics, 2020, 11 : 923 - 934
  • [5] Emotion Recognition Using Multimodal Deep Learning
    Liu, Wei
    Zheng, Wei-Long
    Lu, Bao-Liang
    [J]. NEURAL INFORMATION PROCESSING, ICONIP 2016, PT II, 2016, 9948 : 521 - 529
  • [6] Emotion Recognition on Multimodal with Deep Learning and Ensemble
    Dharma, David Adi
    Zahra, Amalia
    [J]. International Journal of Advanced Computer Science and Applications, 2022, 13 (12): : 656 - 663
  • [7] Emotion Recognition on Multimodal with Deep Learning and Ensemble
    Dharma, David Adi
    Zahra, Amalia
    [J]. INTERNATIONAL JOURNAL OF ADVANCED COMPUTER SCIENCE AND APPLICATIONS, 2022, 13 (12) : 656 - 663
  • [8] Audio-Video Based Multimodal Emotion Recognition Using SVMs and Deep Learning
    Sun, Bo
    Xu, Qihua
    He, Jun
    Yu, Lejun
    Li, Liandong
    Wei, Qinglan
    [J]. PATTERN RECOGNITION (CCPR 2016), PT II, 2016, 663 : 621 - 631
  • [9] Deep Imbalanced Learning for Multimodal Emotion Recognition in Conversations
    Meng, Tao
    Shou, Yuntao
    Ai, Wei
    Yin, Nan
    Li, Keqin
    [J]. IEEE Transactions on Artificial Intelligence, 2024, 5 (12): : 6472 - 6487
  • [10] DEEP MULTIMODAL LEARNING FOR EMOTION RECOGNITION IN SPOKEN LANGUAGE
    Gu, Yue
    Chen, Shuhong
    Marsic, Ivan
    [J]. 2018 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2018, : 5079 - 5083