A comparison of acoustic coding models for speech-driven facial animation

被引：12

作者：

Kakumanu, Praveen

Esposito, Anna

Garcia, Oscar N.

Gutierrez-Osuna, Ricardo ^{[1
]}

机构：

[1] Texas A&M Univ, Dept Comp Sci, College Stn, TX 77843 USA

[2] Wright State Univ, Dept Comp Sci & Engn, Dayton, OH 45435 USA

[3] Univ Naples 2, Dept Psychol, Naples, Italy

[4] Univ N Texas, Coll Engn, Denton, TX 76203 USA

[5] Texas A&M Univ, Dept Comp Sci, College Stn, TX 77843 USA

来源：

SPEECH COMMUNICATION | 2006年 / 48卷 / 06期

关键词：

speech-driven facial animation; audio-visual mapping; linear discriminants analysis;

D O I：

10.1016/j.specom.2005.09.005

中图分类号：

O42 [声学];

学科分类号：

070206 ; 082403 ;

摘要：

This article presents a thorough experimental comparison of several acoustic modeling techniques by their ability to capture information related to orofacial motion. These models include (1) Linear Predictive Coding and Linear Spectral Frequencies, which model the dynamics of the speech production system, (2) Mel Frequency Cepstral Coefficients and Perceptual Critical Feature Bands, which encode perceptual cues of speech, (3) spectral energy and fundamental frequency, which capture prosodic aspects, and (4) two hybrid methods that combine information from the previous models. We also consider a novel supervised procedure based on Fisher's Linear Discriminants to project acoustic information onto a low-dimensional subspace that best discriminates different orofacial configurations. Prediction of orofacial motion from speech acoustics is performed using a non-parametric k-nearest-neighbors procedure. The sensitivity of this audio-visual mapping to coarticulation effects and spatial locality is thoroughly investigated. Our results indicate that the hybrid use of articulatory, perceptual and prosodic features of speech, combined with a supervised dimensionality-reduction procedure, is able to outperform any individual acoustic model for speech-driven facial animation. These results are validated on the 450 sentences of the TIMIT compact dataset. (C) 2005 Elsevier B.V. All rights reserved.

引用

页码：598 / 615

页数：18

共 50 条

[21] Speech-driven animation with meaningful behaviors
Sadoughi, Najmeh
Busso, Carlos
SPEECH COMMUNICATION, 2019, 110 : 90 - 100
[22] CLTalk: Speech-Driven 3D Facial Animation with Contrastive Learning
Zhang, Xitie
Wu, Suping
PROCEEDINGS OF THE 4TH ANNUAL ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA RETRIEVAL, ICMR 2024, 2024, : 1175 - 1179
[23] Geometry-Guided Dense Perspective Network for Speech-Driven Facial Animation
Liu, Jingying
Hui, Binyuan
Li, Kun
Liu, Yunke
Lai, Yu-Kun
Zhang, Yuxiang
Liu, Yebin
Yang, Jingyu
IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS, 2022, 28 (12) : 4873 - 4886
[24] CodeTalker: Speech-Driven 3D Facial Animation with Discrete Motion Prior
Xing, Jinbo
Xia, Menghan
Zhang, Yuechen
Cun, Xiaodong
Wang, Jue
Wong, Tien-Tsin
2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2023, : 12780 - 12790
[25] FaceDiffuser: Speech-Driven 3D Facial Animation Synthesis Using Diffusion
Stan, Stefan
Haque, Kazi Injamamul
Yumak, Zerrin
15TH ANNUAL ACM SIGGRAPH CONFERENCE ON MOTION, INTERACTION AND GAMES, MIG 2023, 2023,
[26] Audio-to-Visual Conversion Via HMM Inversion for Speech-Driven Facial Animation
Terissi, Lucas D.
Gomez, Juan Carlos
ADVANCES IN ARTIFICIAL INTELLIGENCE - SBIA 2008, PROCEEDINGS, 2008, 5249 : 33 - 42
[27] Mimic: Speaking Style Disentanglement for Speech-Driven 3D Facial Animation
Fu, Hui
Wang, Zeqing
Gong, Ke
Wang, Keze
Chen, Tianshui
Li, Haojie
Zeng, Haifeng
Kang, Wenxiong
THIRTY-EIGHTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 38 NO 2, 2024, : 1770 - 1777
[28] KMTalk: Speech-Driven 3D Facial Animation with Key Motion Embedding
Xu, Zhihao
Gong, Shengjie
Tang, Jiapeng
Liang, Lingyu
Huang, Yining
Li, Haojie
Huang, Shuangping
COMPUTER VISION - ECCV 2024, PT LVI, 2025, 15114 : 236 - 253
[29] Speech-Driven 3D Face Animation with Composite and Regional Facial Movements
Wu, Haozhe
Zhou, Songtao
Jia, Jia
Xing, Junliang
Wen, Qi
Wen, Xiang
PROCEEDINGS OF THE 31ST ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2023, 2023, : 6822 - 6830
[30] Speech-Driven Facial Animation Using a Shared Gaussian Process Latent Variable Model
Deena, Salil
Galata, Aphrodite
ADVANCES IN VISUAL COMPUTING, PT 1, PROCEEDINGS, 2009, 5875 : 89 - 100

← 1 2 3 4 5 →