Video Emotion Recognition in the Wild Based on Fusion of Multimodal Features

被引：12

作者：

Chen, Shizhe ^{[1
]}

Li, Xinrui ^{[1
]}

Jin, Qin ^{[1
]}

Zhang, Shilei ^{[2
]}

Qin, Yong ^{[2
]}

机构：

[1] Renmin Univ China, Sch Informat, Beijing, Peoples R China

[2] IBM Res Lab, Beijing, Peoples R China

来源：

ICMI'16: PROCEEDINGS OF THE 18TH ACM INTERNATIONAL CONFERENCE ON MULTIMODAL INTERACTION | 2016年

关键词：

Video Emotion Recognition; Multimodal Features; CNN; Late Fusion;

D O I：

10.1145/2993148.2997629

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

In this paper, we present our methods to the Audio-Video Based Emotion Recognition subtask in the 2016 Emotion Recognition in the Wild (EmotiW) Challenge. The task is to predict one of the seven basic emotions for the characters in the video clips extracted from movies or TV shows. In our approach, we explore various multimodal features from audio, facial image and video motion modalities. The audio features contain statistical acoustic features, MFCC Bagof-Audio-Words and MFCC Fisher Vectors. For image related features, we extract hand-crafted features (LBP-TOP and SPM Dense SIFT) and learned features (CNN features). The improved Dense Trajectory is used as the motion related features. We train SVM, Random Forest and Logistic Regression classifiers for each kind of feature. Among them, MFCC fisher vector is the best acoustic features and the facial CNN feature is the most discriminative feature for emotion recognition. We utilize late fusion to combine different modality features and achieve a 50.76% accuracy on the testing set, which significantly outperforms the baseline test accuracy of 40.47%.

引用

页码：494 / 500

页数：7

共 50 条

[1] Multimodal Fusion of Spatial-Temporal Features for Emotion Recognition in the Wild
Wang, Zuchen
Fang, Yuchun
ADVANCES IN MULTIMEDIA INFORMATION PROCESSING - PCM 2017, PT I, 2018, 10735 : 205 - 214
[2] Combining Multimodal Features within a Fusion Network for Emotion Recognition in the Wild
Sun, Bo
Li, Liandong
Zhou, Guoyan
Wu, Xuewen
He, Jun
Yu, Lejun
Li, Dongxue
Wei, Qinglan
ICMI'15: PROCEEDINGS OF THE 2015 ACM INTERNATIONAL CONFERENCE ON MULTIMODAL INTERACTION, 2015, : 497 - 502
[3] Multimodal Fusion based on Information Gain for Emotion Recognition in the Wild
Ghaleb, Esam
Popa, Mirela
Hortal, Enrique
Asteriadis, Stylianos
PROCEEDINGS OF THE 2017 INTELLIGENT SYSTEMS CONFERENCE (INTELLISYS), 2017, : 814 - 823
[4] Hierarchical Attention-Based Multimodal Fusion Network for Video Emotion Recognition
Liu, Xiaodong
Li, Songyang
Wang, Miao
COMPUTATIONAL INTELLIGENCE AND NEUROSCIENCE, 2021, 2021
[5] Video multimodal emotion recognition based on Bi-GRU and attention fusion
Huan, Ruo-Hong
Shu, Jia
Bao, Sheng-Lin
Liang, Rong-Hua
Chen, Peng
Chi, Kai-Kai
MULTIMEDIA TOOLS AND APPLICATIONS, 2021, 80 (06) : 8213 - 8240
[6] Multimodal Fusion Using Kernel-Based ELM for Video Emotion Recognition
Duan, Lijuan
Ge, Hui
Yang, Zhen
Chen, Juncheng
PROCEEDINGS OF ELM-2015, VOL 1: THEORY, ALGORITHMS AND APPLICATIONS (I), 2016, 6 : 371 - 381
[7] Video multimodal emotion recognition based on Bi-GRU and attention fusion
Ruo-Hong Huan
Jia Shu
Sheng-Lin Bao
Rong-Hua Liang
Peng Chen
Kai-Kai Chi
Multimedia Tools and Applications, 2021, 80 : 8213 - 8240
[8] Multimodal Emotion Recognition Based on Feature Fusion
Xu, Yurui
Wu, Xiao
Su, Hang
Liu, Xiaorui
2022 INTERNATIONAL CONFERENCE ON ADVANCED ROBOTICS AND MECHATRONICS (ICARM 2022), 2022, : 7 - 11
[9] Emotion Recognition Using Fusion of Audio and Video Features
Ortega, Juan D. S.
Cardinal, Patrick
Koerich, Alessandro L.
2019 IEEE INTERNATIONAL CONFERENCE ON SYSTEMS, MAN AND CYBERNETICS (SMC), 2019, : 3847 - 3852
[10] A multiple feature fusion framework for video emotion recognition in the wild
Samadiani, Najmeh
Huang, Guangyan
Luo, Wei
Chi, Chi-Hung
Shu, Yanfeng
Wang, Rui
Kocaturk, Tuba
CONCURRENCY AND COMPUTATION-PRACTICE & EXPERIENCE, 2022, 34 (08):

← 1 2 3 4 5 →