Video Emotion Recognition in the Wild Based on Fusion of Multimodal Features

被引:12
|
作者
Chen, Shizhe [1 ]
Li, Xinrui [1 ]
Jin, Qin [1 ]
Zhang, Shilei [2 ]
Qin, Yong [2 ]
机构
[1] Renmin Univ China, Sch Informat, Beijing, Peoples R China
[2] IBM Res Lab, Beijing, Peoples R China
关键词
Video Emotion Recognition; Multimodal Features; CNN; Late Fusion;
D O I
10.1145/2993148.2997629
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In this paper, we present our methods to the Audio-Video Based Emotion Recognition subtask in the 2016 Emotion Recognition in the Wild (EmotiW) Challenge. The task is to predict one of the seven basic emotions for the characters in the video clips extracted from movies or TV shows. In our approach, we explore various multimodal features from audio, facial image and video motion modalities. The audio features contain statistical acoustic features, MFCC Bagof-Audio-Words and MFCC Fisher Vectors. For image related features, we extract hand-crafted features (LBP-TOP and SPM Dense SIFT) and learned features (CNN features). The improved Dense Trajectory is used as the motion related features. We train SVM, Random Forest and Logistic Regression classifiers for each kind of feature. Among them, MFCC fisher vector is the best acoustic features and the facial CNN feature is the most discriminative feature for emotion recognition. We utilize late fusion to combine different modality features and achieve a 50.76% accuracy on the testing set, which significantly outperforms the baseline test accuracy of 40.47%.
引用
收藏
页码:494 / 500
页数:7
相关论文
共 50 条
  • [1] Multimodal Fusion of Spatial-Temporal Features for Emotion Recognition in the Wild
    Wang, Zuchen
    Fang, Yuchun
    ADVANCES IN MULTIMEDIA INFORMATION PROCESSING - PCM 2017, PT I, 2018, 10735 : 205 - 214
  • [2] Combining Multimodal Features within a Fusion Network for Emotion Recognition in the Wild
    Sun, Bo
    Li, Liandong
    Zhou, Guoyan
    Wu, Xuewen
    He, Jun
    Yu, Lejun
    Li, Dongxue
    Wei, Qinglan
    ICMI'15: PROCEEDINGS OF THE 2015 ACM INTERNATIONAL CONFERENCE ON MULTIMODAL INTERACTION, 2015, : 497 - 502
  • [3] Multimodal Fusion based on Information Gain for Emotion Recognition in the Wild
    Ghaleb, Esam
    Popa, Mirela
    Hortal, Enrique
    Asteriadis, Stylianos
    PROCEEDINGS OF THE 2017 INTELLIGENT SYSTEMS CONFERENCE (INTELLISYS), 2017, : 814 - 823
  • [4] Hierarchical Attention-Based Multimodal Fusion Network for Video Emotion Recognition
    Liu, Xiaodong
    Li, Songyang
    Wang, Miao
    COMPUTATIONAL INTELLIGENCE AND NEUROSCIENCE, 2021, 2021
  • [5] Video multimodal emotion recognition based on Bi-GRU and attention fusion
    Huan, Ruo-Hong
    Shu, Jia
    Bao, Sheng-Lin
    Liang, Rong-Hua
    Chen, Peng
    Chi, Kai-Kai
    MULTIMEDIA TOOLS AND APPLICATIONS, 2021, 80 (06) : 8213 - 8240
  • [6] Multimodal Fusion Using Kernel-Based ELM for Video Emotion Recognition
    Duan, Lijuan
    Ge, Hui
    Yang, Zhen
    Chen, Juncheng
    PROCEEDINGS OF ELM-2015, VOL 1: THEORY, ALGORITHMS AND APPLICATIONS (I), 2016, 6 : 371 - 381
  • [7] Video multimodal emotion recognition based on Bi-GRU and attention fusion
    Ruo-Hong Huan
    Jia Shu
    Sheng-Lin Bao
    Rong-Hua Liang
    Peng Chen
    Kai-Kai Chi
    Multimedia Tools and Applications, 2021, 80 : 8213 - 8240
  • [8] Multimodal Emotion Recognition Based on Feature Fusion
    Xu, Yurui
    Wu, Xiao
    Su, Hang
    Liu, Xiaorui
    2022 INTERNATIONAL CONFERENCE ON ADVANCED ROBOTICS AND MECHATRONICS (ICARM 2022), 2022, : 7 - 11
  • [9] Emotion Recognition Using Fusion of Audio and Video Features
    Ortega, Juan D. S.
    Cardinal, Patrick
    Koerich, Alessandro L.
    2019 IEEE INTERNATIONAL CONFERENCE ON SYSTEMS, MAN AND CYBERNETICS (SMC), 2019, : 3847 - 3852
  • [10] A multiple feature fusion framework for video emotion recognition in the wild
    Samadiani, Najmeh
    Huang, Guangyan
    Luo, Wei
    Chi, Chi-Hung
    Shu, Yanfeng
    Wang, Rui
    Kocaturk, Tuba
    CONCURRENCY AND COMPUTATION-PRACTICE & EXPERIENCE, 2022, 34 (08):