A comparison of acoustic coding models for speech-driven facial animation

被引：12

作者：

Kakumanu, Praveen

Esposito, Anna

Garcia, Oscar N.

Gutierrez-Osuna, Ricardo ^{[1
]}

机构：

[1] Texas A&M Univ, Dept Comp Sci, College Stn, TX 77843 USA

[2] Wright State Univ, Dept Comp Sci & Engn, Dayton, OH 45435 USA

[3] Univ Naples 2, Dept Psychol, Naples, Italy

[4] Univ N Texas, Coll Engn, Denton, TX 76203 USA

[5] Texas A&M Univ, Dept Comp Sci, College Stn, TX 77843 USA

来源：

SPEECH COMMUNICATION | 2006年 / 48卷 / 06期

关键词：

speech-driven facial animation; audio-visual mapping; linear discriminants analysis;

D O I：

10.1016/j.specom.2005.09.005

中图分类号：

O42 [声学];

学科分类号：

070206 ; 082403 ;

摘要：

This article presents a thorough experimental comparison of several acoustic modeling techniques by their ability to capture information related to orofacial motion. These models include (1) Linear Predictive Coding and Linear Spectral Frequencies, which model the dynamics of the speech production system, (2) Mel Frequency Cepstral Coefficients and Perceptual Critical Feature Bands, which encode perceptual cues of speech, (3) spectral energy and fundamental frequency, which capture prosodic aspects, and (4) two hybrid methods that combine information from the previous models. We also consider a novel supervised procedure based on Fisher's Linear Discriminants to project acoustic information onto a low-dimensional subspace that best discriminates different orofacial configurations. Prediction of orofacial motion from speech acoustics is performed using a non-parametric k-nearest-neighbors procedure. The sensitivity of this audio-visual mapping to coarticulation effects and spatial locality is thoroughly investigated. Our results indicate that the hybrid use of articulatory, perceptual and prosodic features of speech, combined with a supervised dimensionality-reduction procedure, is able to outperform any individual acoustic model for speech-driven facial animation. These results are validated on the 450 sentences of the TIMIT compact dataset. (C) 2005 Elsevier B.V. All rights reserved.

引用

页码：598 / 615

页数：18

共 50 条

[41] Individual 3D face synthesis based on orthogonal photos and speech-driven facial animation
Shan, SG
Gao, W
Yan, J
Zhang, HM
Chen, XL
2000 INTERNATIONAL CONFERENCE ON IMAGE PROCESSING, VOL III, PROCEEDINGS, 2000, : 238 - 241
[42] A study on auditory feature spaces for speech-driven lip animation
Le-Jan, Guylaine
Benezeth, Yannick
Gravier, Guillaume
Bimbot, Frederic
12TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2011 (INTERSPEECH 2011), VOLS 1-5, 2011, : 2508 - 2511
[43] Emotional Speech-Driven Animation with Content-Emotion Disentanglement
Danecek, Radek
Chhatre, Kiran
Tripathi, Shashank
Wen, Yandong
Black, Michael
Bolkart, Timo
PROCEEDINGS OF THE SIGGRAPH ASIA 2023 CONFERENCE PAPERS, 2023,
[44] Speech4Mesh: Speech-Assisted Monocular 3D Facial Reconstruction for Speech-Driven 3D Facial Animation
He, Shan
He, Haonan
Yang, Shuo
Wu, Xiaoyan
Xia, Pengcheng
Yin, Bing
Liu, Cong
Dai, Lirong
Xu, Chang
2023 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2023), 2023, : 14146 - 14156
[45] Real-time speech-driven animation of expressive talking faces
Liu, Jia
You, Mingyu
Chen, Chun
Song, Mingli
INTERNATIONAL JOURNAL OF GENERAL SYSTEMS, 2011, 40 (04) : 439 - 455
[46] HMM BASED SPEECH-DRIVEN 3D TONGUE ANIMATION
Luo, Changwei
Yu, Jun
Li, Xian
Zhang, Leilei
2017 24TH IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING (ICIP), 2017, : 4377 - 4381
[47] A Research on Facial Animation Driven by Emotional Speech
Lixiang, Li
ICCSSE 2009: PROCEEDINGS OF 2009 4TH INTERNATIONAL CONFERENCE ON COMPUTER SCIENCE & EDUCATION, 2009, : 118 - 121
[48] Language Independent Speech Driven Facial Animation
Singh, Archana
Jotwani, Naresh D.
IEEE REGION 10 COLLOQUIUM AND THIRD INTERNATIONAL CONFERENCE ON INDUSTRIAL AND INFORMATION SYSTEMS, VOLS 1 AND 2, 2008, : 613 - 618
[49] VividWav2Lip: High-Fidelity Facial Animation Generation Based on Speech-Driven Lip Synchronization
Liu, Li
Wang, Jinhui
Chen, Shijuan
Li, Zongmei
ELECTRONICS, 2024, 13 (18)
[50] Speech-Driven Facial Animations Improve Speech-in-Noise Comprehension of Humans
Varano, Enrico
Vougioukas, Konstantinos
Ma, Pingchuan
Petridis, Stavros
Pantic, Maja
Reichenbach, Tobias
FRONTIERS IN NEUROSCIENCE, 2022, 15

← 1 2 3 4 5 →