A comparison of acoustic coding models for speech-driven facial animation

被引：12

作者：

Kakumanu, Praveen

Esposito, Anna

Garcia, Oscar N.

Gutierrez-Osuna, Ricardo ^{[1
]}

机构：

[1] Texas A&M Univ, Dept Comp Sci, College Stn, TX 77843 USA

[2] Wright State Univ, Dept Comp Sci & Engn, Dayton, OH 45435 USA

[3] Univ Naples 2, Dept Psychol, Naples, Italy

[4] Univ N Texas, Coll Engn, Denton, TX 76203 USA

[5] Texas A&M Univ, Dept Comp Sci, College Stn, TX 77843 USA

来源：

SPEECH COMMUNICATION | 2006年 / 48卷 / 06期

关键词：

speech-driven facial animation; audio-visual mapping; linear discriminants analysis;

D O I：

10.1016/j.specom.2005.09.005

中图分类号：

O42 [声学];

学科分类号：

070206 ; 082403 ;

摘要：

This article presents a thorough experimental comparison of several acoustic modeling techniques by their ability to capture information related to orofacial motion. These models include (1) Linear Predictive Coding and Linear Spectral Frequencies, which model the dynamics of the speech production system, (2) Mel Frequency Cepstral Coefficients and Perceptual Critical Feature Bands, which encode perceptual cues of speech, (3) spectral energy and fundamental frequency, which capture prosodic aspects, and (4) two hybrid methods that combine information from the previous models. We also consider a novel supervised procedure based on Fisher's Linear Discriminants to project acoustic information onto a low-dimensional subspace that best discriminates different orofacial configurations. Prediction of orofacial motion from speech acoustics is performed using a non-parametric k-nearest-neighbors procedure. The sensitivity of this audio-visual mapping to coarticulation effects and spatial locality is thoroughly investigated. Our results indicate that the hybrid use of articulatory, perceptual and prosodic features of speech, combined with a supervised dimensionality-reduction procedure, is able to outperform any individual acoustic model for speech-driven facial animation. These results are validated on the 450 sentences of the TIMIT compact dataset. (C) 2005 Elsevier B.V. All rights reserved.

引用

页码：598 / 615

页数：18

共 50 条

[31] ANALYZING VISIBLE ARTICULATORY MOVEMENTS IN SPEECH PRODUCTION FOR SPEECH-DRIVEN 3D FACIAL ANIMATION
Kim, Hyung Kyu
Lee, Sangmin
Kim, Hak Gu
Proceedings - International Conference on Image Processing, ICIP, 2024, : 3575 - 3579
[32] DiffPoseTalk: Speech-Driven Stylistic 3D Facial Animation and Head Pose Generation via Diffusion Models
Sun, Zhiyao
Lv, Tian
Ye, Sheng
Lin, Matthieu
Sheng, Jenny
Wen, Yu-Hui
Yu, Minjing
Liu, Yong-Jin
ACM TRANSACTIONS ON GRAPHICS, 2024, 43 (04):
[33] Speech driven facial animation
Yang, TJ
Lin, IC
Hung, CS
Huang, CF
Ming, OY
COMPUTER ANIMATION AND SIMULATION'99, 1999, : 99 - 108
[34] SPACE : Speech-driven Portrait Animation with Controllable Expression
Gururani, Siddharth
Mallya, Arun
Wang, Ting-Chun
Valle, Rafael
Liu, Ming-Yu
2023 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2023), 2023, : 20857 - 20866
[35] NewTalker: Exploring frequency domain for speech-driven 3D facial animation with Mamba
Niu, Weiran
Wang, Zan
Li, Yi
Lou, Tangtang
IET Image Processing, 2025, 19 (01)
[36] CSTalk: Correlation Supervised Speech-driven 3D Emotional Facial Animation Generation
Liang, Xiangyu
Zhuang, Wenlin
Wang, Tianyong
Geng, Guangxing
Geng, Guangyue
Xia, Haifeng
Xia, Siyu
2024 IEEE 18TH INTERNATIONAL CONFERENCE ON AUTOMATIC FACE AND GESTURE RECOGNITION, FG 2024, 2024,
[37] Speech-driven automatic facial expression synthesis
Bozkurt, Elif
Erdem, Cigdem Eroglu
Erzin, Engin
Erdem, Tanju
Oezkan, Mehmet
Tekalp, A. Murat
2008 3DTV-CONFERENCE: THE TRUE VISION - CAPTURE, TRANSMISSION AND DISPLAY OF 3D VIDEO, 2008, : 253 - +
[38] Speech-driven 3D Facial Animation with Implicit Emotional Awareness: A Deep Learning Approach
Pham, Hai X.
Cheung, Samuel
Pavlovic, Vladimir
2017 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION WORKSHOPS (CVPRW), 2017, : 2328 - 2336
[39] Joint Audio-Text Model for Expressive Speech-Driven 3D Facial Animation
Fan, Yingruo
Lin, Zhaojiang
Saito, Jun
Wang, Wenping
Komura, Taku
PROCEEDINGS OF THE ACM ON COMPUTER GRAPHICS AND INTERACTIVE TECHNIQUES, 2022, 5 (01)
[40] A low bit-rate web-enabled synthetic head with speech-driven facial animation
Lin, IC
Huang, CF
Wu, JC
Ouhyoung, M
COMPUTER ANIMATION AND SIMULATION 2000, 2000, : 29 - 40

← 1 2 3 4 5 →