A comparison of acoustic coding models for speech-driven facial animation

被引：12

作者：

Kakumanu, Praveen

Esposito, Anna

Garcia, Oscar N.

Gutierrez-Osuna, Ricardo ^{[1
]}

机构：

[1] Texas A&M Univ, Dept Comp Sci, College Stn, TX 77843 USA

[2] Wright State Univ, Dept Comp Sci & Engn, Dayton, OH 45435 USA

[3] Univ Naples 2, Dept Psychol, Naples, Italy

[4] Univ N Texas, Coll Engn, Denton, TX 76203 USA

[5] Texas A&M Univ, Dept Comp Sci, College Stn, TX 77843 USA

来源：

SPEECH COMMUNICATION | 2006年 / 48卷 / 06期

关键词：

speech-driven facial animation; audio-visual mapping; linear discriminants analysis;

D O I：

10.1016/j.specom.2005.09.005

中图分类号：

O42 [声学];

学科分类号：

070206 ; 082403 ;

摘要：

This article presents a thorough experimental comparison of several acoustic modeling techniques by their ability to capture information related to orofacial motion. These models include (1) Linear Predictive Coding and Linear Spectral Frequencies, which model the dynamics of the speech production system, (2) Mel Frequency Cepstral Coefficients and Perceptual Critical Feature Bands, which encode perceptual cues of speech, (3) spectral energy and fundamental frequency, which capture prosodic aspects, and (4) two hybrid methods that combine information from the previous models. We also consider a novel supervised procedure based on Fisher's Linear Discriminants to project acoustic information onto a low-dimensional subspace that best discriminates different orofacial configurations. Prediction of orofacial motion from speech acoustics is performed using a non-parametric k-nearest-neighbors procedure. The sensitivity of this audio-visual mapping to coarticulation effects and spatial locality is thoroughly investigated. Our results indicate that the hybrid use of articulatory, perceptual and prosodic features of speech, combined with a supervised dimensionality-reduction procedure, is able to outperform any individual acoustic model for speech-driven facial animation. These results are validated on the 450 sentences of the TIMIT compact dataset. (C) 2005 Elsevier B.V. All rights reserved.

引用

页码：598 / 615

页数：18

共 50 条

[1] Expressive speech-driven facial animation
Cao, Y
Tien, WC
Faloutsos, P
Pighin, F
ACM TRANSACTIONS ON GRAPHICS, 2005, 24 (04): : 1283 - 1302
[2] Speech-driven facial animation with realistic dynamics
Gutierrez-Osuna, R
Kakumanu, PK
Esposito, A
Garcia, ON
Bojorquez, A
Castillo, JL
Rudomin, I
IEEE TRANSACTIONS ON MULTIMEDIA, 2005, 7 (01) : 33 - 42
[3] Realistic Speech-Driven Facial Animation with GANs
Konstantinos Vougioukas
Stavros Petridis
Maja Pantic
International Journal of Computer Vision, 2020, 128 : 1398 - 1413
[4] Realistic Speech-Driven Facial Animation with GANs
Vougioukas, Konstantinos
Petridis, Stavros
Pantic, Maja
INTERNATIONAL JOURNAL OF COMPUTER VISION, 2020, 128 (05) : 1398 - 1413
[5] REALTIME SPEECH-DRIVEN FACIAL ANIMATION USING GAUSSIAN MIXTURE MODELS
Luo, Changwei
Yu, Jun
Li, Xian
Wang, Zengfu
2014 IEEE INTERNATIONAL CONFERENCE ON MULTIMEDIA AND EXPO WORKSHOPS (ICMEW), 2014,
[6] Speech-driven facial animation using a hierarchical model
Cosker, DP
Marshall, AD
Rosin, PL
Hicks, YA
IEE PROCEEDINGS-VISION IMAGE AND SIGNAL PROCESSING, 2004, 151 (04): : 314 - 321
[7] Towards Realistic Real Time Speech-Driven Facial Animation
Cerekovic, Aleksandra
Zoric, Goranka
Smid, Karlo
Pandzic, Igor S.
INTELLIGENT VIRTUAL AGENTS, PROCEEDINGS, 2008, 5208 : 476 - 478
[8] Speech-driven facial animation with spectral gathering and temporal attention
Yujin Chai
Yanlin Weng
Lvdi Wang
Kun Zhou
Frontiers of Computer Science, 2022, 16
[9] Speech-driven facial animation with spectral gathering and temporal attention
Chai, Yujin
Weng, Yanlin
Wang, Lvdi
Zhou, Kun
FRONTIERS OF COMPUTER SCIENCE, 2022, 16 (03)
[10] Speech-Driven Facial Animation Using Manifold Relevance Determination
Dawood, Samia
Hicks, Yulia
Marshall, David
COMPUTER VISION - ECCV 2016 WORKSHOPS, PT II, 2016, 9914 : 869 - 882

← 1 2 3 4 5 →