A comparison of acoustic coding models for speech-driven facial animation

被引:12
|
作者
Kakumanu, Praveen
Esposito, Anna
Garcia, Oscar N.
Gutierrez-Osuna, Ricardo [1 ]
机构
[1] Texas A&M Univ, Dept Comp Sci, College Stn, TX 77843 USA
[2] Wright State Univ, Dept Comp Sci & Engn, Dayton, OH 45435 USA
[3] Univ Naples 2, Dept Psychol, Naples, Italy
[4] Univ N Texas, Coll Engn, Denton, TX 76203 USA
[5] Texas A&M Univ, Dept Comp Sci, College Stn, TX 77843 USA
关键词
speech-driven facial animation; audio-visual mapping; linear discriminants analysis;
D O I
10.1016/j.specom.2005.09.005
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
This article presents a thorough experimental comparison of several acoustic modeling techniques by their ability to capture information related to orofacial motion. These models include (1) Linear Predictive Coding and Linear Spectral Frequencies, which model the dynamics of the speech production system, (2) Mel Frequency Cepstral Coefficients and Perceptual Critical Feature Bands, which encode perceptual cues of speech, (3) spectral energy and fundamental frequency, which capture prosodic aspects, and (4) two hybrid methods that combine information from the previous models. We also consider a novel supervised procedure based on Fisher's Linear Discriminants to project acoustic information onto a low-dimensional subspace that best discriminates different orofacial configurations. Prediction of orofacial motion from speech acoustics is performed using a non-parametric k-nearest-neighbors procedure. The sensitivity of this audio-visual mapping to coarticulation effects and spatial locality is thoroughly investigated. Our results indicate that the hybrid use of articulatory, perceptual and prosodic features of speech, combined with a supervised dimensionality-reduction procedure, is able to outperform any individual acoustic model for speech-driven facial animation. These results are validated on the 450 sentences of the TIMIT compact dataset. (C) 2005 Elsevier B.V. All rights reserved.
引用
收藏
页码:598 / 615
页数:18
相关论文
共 50 条
  • [21] Speech-driven animation with meaningful behaviors
    Sadoughi, Najmeh
    Busso, Carlos
    SPEECH COMMUNICATION, 2019, 110 : 90 - 100
  • [22] CLTalk: Speech-Driven 3D Facial Animation with Contrastive Learning
    Zhang, Xitie
    Wu, Suping
    PROCEEDINGS OF THE 4TH ANNUAL ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA RETRIEVAL, ICMR 2024, 2024, : 1175 - 1179
  • [23] Geometry-Guided Dense Perspective Network for Speech-Driven Facial Animation
    Liu, Jingying
    Hui, Binyuan
    Li, Kun
    Liu, Yunke
    Lai, Yu-Kun
    Zhang, Yuxiang
    Liu, Yebin
    Yang, Jingyu
    IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS, 2022, 28 (12) : 4873 - 4886
  • [24] CodeTalker: Speech-Driven 3D Facial Animation with Discrete Motion Prior
    Xing, Jinbo
    Xia, Menghan
    Zhang, Yuechen
    Cun, Xiaodong
    Wang, Jue
    Wong, Tien-Tsin
    2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2023, : 12780 - 12790
  • [25] FaceDiffuser: Speech-Driven 3D Facial Animation Synthesis Using Diffusion
    Stan, Stefan
    Haque, Kazi Injamamul
    Yumak, Zerrin
    15TH ANNUAL ACM SIGGRAPH CONFERENCE ON MOTION, INTERACTION AND GAMES, MIG 2023, 2023,
  • [26] Audio-to-Visual Conversion Via HMM Inversion for Speech-Driven Facial Animation
    Terissi, Lucas D.
    Gomez, Juan Carlos
    ADVANCES IN ARTIFICIAL INTELLIGENCE - SBIA 2008, PROCEEDINGS, 2008, 5249 : 33 - 42
  • [27] Mimic: Speaking Style Disentanglement for Speech-Driven 3D Facial Animation
    Fu, Hui
    Wang, Zeqing
    Gong, Ke
    Wang, Keze
    Chen, Tianshui
    Li, Haojie
    Zeng, Haifeng
    Kang, Wenxiong
    THIRTY-EIGHTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 38 NO 2, 2024, : 1770 - 1777
  • [28] KMTalk: Speech-Driven 3D Facial Animation with Key Motion Embedding
    Xu, Zhihao
    Gong, Shengjie
    Tang, Jiapeng
    Liang, Lingyu
    Huang, Yining
    Li, Haojie
    Huang, Shuangping
    COMPUTER VISION - ECCV 2024, PT LVI, 2025, 15114 : 236 - 253
  • [29] Speech-Driven 3D Face Animation with Composite and Regional Facial Movements
    Wu, Haozhe
    Zhou, Songtao
    Jia, Jia
    Xing, Junliang
    Wen, Qi
    Wen, Xiang
    PROCEEDINGS OF THE 31ST ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2023, 2023, : 6822 - 6830
  • [30] Speech-Driven Facial Animation Using a Shared Gaussian Process Latent Variable Model
    Deena, Salil
    Galata, Aphrodite
    ADVANCES IN VISUAL COMPUTING, PT 1, PROCEEDINGS, 2009, 5875 : 89 - 100