High-Level Geometry-based Features of Video Modality for Emotion Prediction

被引：13

作者：

Weber, Raphael ^{[1
]}

Barrielle, Vincent ^{[2
]}

Soladie, Catherine ^{[1
]}

Seguier, Renaud ^{[1
]}

机构：

[1] IETR, FAST, CentraleSupelec, Ave Boulaie, F-35576 Cesson Sevigne, France

[2] Dynamixyz, 80 Ave Buttes de Coesmes, F-35700 Rennes, France

来源：

PROCEEDINGS OF THE 6TH INTERNATIONAL WORKSHOP ON AUDIO/VISUAL EMOTION CHALLENGE (AVEC'16) | 2016年

关键词：

HEAD POSE ESTIMATION; FACIAL EXPRESSIONS; REPRESENTATION;

D O I：

10.1145/2988257.2988262

中图分类号：

O42 [声学];

学科分类号：

070206 ; 082403 ;

摘要：

The automatic analysis of emotion remains a challenging task in unconstrained experimental conditions. In this paper, we present our contribution to the 6th Audio/Visual Emotion Challenge (AVEC 2016), which aims at predicting the continuous emotional dimensions of arousal and valence. First, we propose to improve the performance of the multi-modal prediction with low-level features by adding high-level geometry-based features, namely head pose and expression signature. The head pose is estimated by fitting a reference 3D mesh to the 2D facial landmarks. The expression signature is the projection of the facial landmarks in an unsupervised person-specific model. Second, we propose to fuse the unimodal predictions trained on each training subject before performing the multimodal fusion. The results show that our high-level features improve the performance of the multi-modal prediction of arousal and that the subjects fusion works well in unimodal prediction but generalizes poorly in multimodal prediction, particularly on valence.

引用

页码：51 / 58

页数：8

共 50 条

[41] PREDICTION OF HIGH-LEVEL ROWING ABILITY
WILLIAMS, LRT
JOURNAL OF SPORTS MEDICINE AND PHYSICAL FITNESS, 1978, 18 (01): : 11 - 17
[42] High-level context representation for emotion recognition in images
Costa, Willams de Lima
Martinez, Estefania Talavera
Figueiredo, Lucas Silva
Teichrieb, Veronica
2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION WORKSHOPS, CVPRW, 2023, : 326 - 334
[43] High-Level Libraries for Emotion Recognition in Music: A Review
Ospitia Medina, Yesid
Baldassarri, Sandra
Ramon Beltran, Jose
HUMAN-COMPUTER INTERACTION, HCI-COLLAB 2018, 2019, 847 : 158 - 168
[44] Predicting cognitive load with EEG using Riemannian geometry-based features
Kremer, Iris
Halimi, Wissam
Walshe, Andy
Cerf, Moran
Mainar, Pablo
JOURNAL OF NEURAL ENGINEERING, 2024, 21 (05)
[45] Exploiting word-level features for emotion prediction
Nicholas, Greg
Rotaru, Mihai
Litman, Diane J.
2006 IEEE SPOKEN LANGUAGE TECHNOLOGY WORKSHOP, 2006, : 110 - +
[46] Automatic generation of soccer video content hierarchy by mapping low-level features to high-level semantics
Chen, JY
Li, YH
Lao, SY
Wu, LD
THIRD INTERNATIONAL SYMPOSIUM ON MULTISPECTRAL IMAGE PROCESSING AND PATTERN RECOGNITION, PTS 1 AND 2, 2003, 5286 : 541 - 546
[47] Kernel alignment maximization for speaker recognition based on high-level features
Drgas, Szymon
Dabrowski, Adam
12TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2011 (INTERSPEECH 2011), VOLS 1-5, 2011, : 496 - 499
[48] Bimodal fusion of low-level visual features and high-level semantic features for near-duplicate video clip detection
Min, Hyun-seok
Choi, Jae Young
De Neve, Wesley
Ro, Yong Man
SIGNAL PROCESSING-IMAGE COMMUNICATION, 2011, 26 (10) : 612 - 627
[49] A symplectic geometry-based method for nonlinear time series decomposition and prediction
Xie, Hong-Bo
Dokos, Socrates
APPLIED PHYSICS LETTERS, 2013, 103 (05)
[50] LIGHT-FIELD VIDEO CODING USING GEOMETRY-BASED DISPARITY COMPENSATION
Conti, Caroline
Kovacs, Peter Tamas
Balogh, Tibor
Nunes, Paulo
Soares, Luis Ducla
2014 3DTV-CONFERENCE: THE TRUE VISION - CAPTURE, TRANSMISSION AND DISPLAY OF 3D VIDEO (3DTV-CON), 2014,

← 1 2 3 4 5 →