A comparison of acoustic coding models for speech-driven facial animation

被引:12
|
作者
Kakumanu, Praveen
Esposito, Anna
Garcia, Oscar N.
Gutierrez-Osuna, Ricardo [1 ]
机构
[1] Texas A&M Univ, Dept Comp Sci, College Stn, TX 77843 USA
[2] Wright State Univ, Dept Comp Sci & Engn, Dayton, OH 45435 USA
[3] Univ Naples 2, Dept Psychol, Naples, Italy
[4] Univ N Texas, Coll Engn, Denton, TX 76203 USA
[5] Texas A&M Univ, Dept Comp Sci, College Stn, TX 77843 USA
关键词
speech-driven facial animation; audio-visual mapping; linear discriminants analysis;
D O I
10.1016/j.specom.2005.09.005
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
This article presents a thorough experimental comparison of several acoustic modeling techniques by their ability to capture information related to orofacial motion. These models include (1) Linear Predictive Coding and Linear Spectral Frequencies, which model the dynamics of the speech production system, (2) Mel Frequency Cepstral Coefficients and Perceptual Critical Feature Bands, which encode perceptual cues of speech, (3) spectral energy and fundamental frequency, which capture prosodic aspects, and (4) two hybrid methods that combine information from the previous models. We also consider a novel supervised procedure based on Fisher's Linear Discriminants to project acoustic information onto a low-dimensional subspace that best discriminates different orofacial configurations. Prediction of orofacial motion from speech acoustics is performed using a non-parametric k-nearest-neighbors procedure. The sensitivity of this audio-visual mapping to coarticulation effects and spatial locality is thoroughly investigated. Our results indicate that the hybrid use of articulatory, perceptual and prosodic features of speech, combined with a supervised dimensionality-reduction procedure, is able to outperform any individual acoustic model for speech-driven facial animation. These results are validated on the 450 sentences of the TIMIT compact dataset. (C) 2005 Elsevier B.V. All rights reserved.
引用
收藏
页码:598 / 615
页数:18
相关论文
共 50 条
  • [31] ANALYZING VISIBLE ARTICULATORY MOVEMENTS IN SPEECH PRODUCTION FOR SPEECH-DRIVEN 3D FACIAL ANIMATION
    Kim, Hyung Kyu
    Lee, Sangmin
    Kim, Hak Gu
    Proceedings - International Conference on Image Processing, ICIP, 2024, : 3575 - 3579
  • [32] DiffPoseTalk: Speech-Driven Stylistic 3D Facial Animation and Head Pose Generation via Diffusion Models
    Sun, Zhiyao
    Lv, Tian
    Ye, Sheng
    Lin, Matthieu
    Sheng, Jenny
    Wen, Yu-Hui
    Yu, Minjing
    Liu, Yong-Jin
    ACM TRANSACTIONS ON GRAPHICS, 2024, 43 (04):
  • [33] Speech driven facial animation
    Yang, TJ
    Lin, IC
    Hung, CS
    Huang, CF
    Ming, OY
    COMPUTER ANIMATION AND SIMULATION'99, 1999, : 99 - 108
  • [34] SPACE : Speech-driven Portrait Animation with Controllable Expression
    Gururani, Siddharth
    Mallya, Arun
    Wang, Ting-Chun
    Valle, Rafael
    Liu, Ming-Yu
    2023 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2023), 2023, : 20857 - 20866
  • [35] NewTalker: Exploring frequency domain for speech-driven 3D facial animation with Mamba
    Niu, Weiran
    Wang, Zan
    Li, Yi
    Lou, Tangtang
    IET Image Processing, 2025, 19 (01)
  • [36] CSTalk: Correlation Supervised Speech-driven 3D Emotional Facial Animation Generation
    Liang, Xiangyu
    Zhuang, Wenlin
    Wang, Tianyong
    Geng, Guangxing
    Geng, Guangyue
    Xia, Haifeng
    Xia, Siyu
    2024 IEEE 18TH INTERNATIONAL CONFERENCE ON AUTOMATIC FACE AND GESTURE RECOGNITION, FG 2024, 2024,
  • [37] Speech-driven automatic facial expression synthesis
    Bozkurt, Elif
    Erdem, Cigdem Eroglu
    Erzin, Engin
    Erdem, Tanju
    Oezkan, Mehmet
    Tekalp, A. Murat
    2008 3DTV-CONFERENCE: THE TRUE VISION - CAPTURE, TRANSMISSION AND DISPLAY OF 3D VIDEO, 2008, : 253 - +
  • [38] Speech-driven 3D Facial Animation with Implicit Emotional Awareness: A Deep Learning Approach
    Pham, Hai X.
    Cheung, Samuel
    Pavlovic, Vladimir
    2017 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION WORKSHOPS (CVPRW), 2017, : 2328 - 2336
  • [39] Joint Audio-Text Model for Expressive Speech-Driven 3D Facial Animation
    Fan, Yingruo
    Lin, Zhaojiang
    Saito, Jun
    Wang, Wenping
    Komura, Taku
    PROCEEDINGS OF THE ACM ON COMPUTER GRAPHICS AND INTERACTIVE TECHNIQUES, 2022, 5 (01)
  • [40] A low bit-rate web-enabled synthetic head with speech-driven facial animation
    Lin, IC
    Huang, CF
    Wu, JC
    Ouhyoung, M
    COMPUTER ANIMATION AND SIMULATION 2000, 2000, : 29 - 40