A comparison of acoustic coding models for speech-driven facial animation

被引:12
|
作者
Kakumanu, Praveen
Esposito, Anna
Garcia, Oscar N.
Gutierrez-Osuna, Ricardo [1 ]
机构
[1] Texas A&M Univ, Dept Comp Sci, College Stn, TX 77843 USA
[2] Wright State Univ, Dept Comp Sci & Engn, Dayton, OH 45435 USA
[3] Univ Naples 2, Dept Psychol, Naples, Italy
[4] Univ N Texas, Coll Engn, Denton, TX 76203 USA
[5] Texas A&M Univ, Dept Comp Sci, College Stn, TX 77843 USA
关键词
speech-driven facial animation; audio-visual mapping; linear discriminants analysis;
D O I
10.1016/j.specom.2005.09.005
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
This article presents a thorough experimental comparison of several acoustic modeling techniques by their ability to capture information related to orofacial motion. These models include (1) Linear Predictive Coding and Linear Spectral Frequencies, which model the dynamics of the speech production system, (2) Mel Frequency Cepstral Coefficients and Perceptual Critical Feature Bands, which encode perceptual cues of speech, (3) spectral energy and fundamental frequency, which capture prosodic aspects, and (4) two hybrid methods that combine information from the previous models. We also consider a novel supervised procedure based on Fisher's Linear Discriminants to project acoustic information onto a low-dimensional subspace that best discriminates different orofacial configurations. Prediction of orofacial motion from speech acoustics is performed using a non-parametric k-nearest-neighbors procedure. The sensitivity of this audio-visual mapping to coarticulation effects and spatial locality is thoroughly investigated. Our results indicate that the hybrid use of articulatory, perceptual and prosodic features of speech, combined with a supervised dimensionality-reduction procedure, is able to outperform any individual acoustic model for speech-driven facial animation. These results are validated on the 450 sentences of the TIMIT compact dataset. (C) 2005 Elsevier B.V. All rights reserved.
引用
收藏
页码:598 / 615
页数:18
相关论文
共 50 条
  • [41] Individual 3D face synthesis based on orthogonal photos and speech-driven facial animation
    Shan, SG
    Gao, W
    Yan, J
    Zhang, HM
    Chen, XL
    2000 INTERNATIONAL CONFERENCE ON IMAGE PROCESSING, VOL III, PROCEEDINGS, 2000, : 238 - 241
  • [42] A study on auditory feature spaces for speech-driven lip animation
    Le-Jan, Guylaine
    Benezeth, Yannick
    Gravier, Guillaume
    Bimbot, Frederic
    12TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2011 (INTERSPEECH 2011), VOLS 1-5, 2011, : 2508 - 2511
  • [43] Emotional Speech-Driven Animation with Content-Emotion Disentanglement
    Danecek, Radek
    Chhatre, Kiran
    Tripathi, Shashank
    Wen, Yandong
    Black, Michael
    Bolkart, Timo
    PROCEEDINGS OF THE SIGGRAPH ASIA 2023 CONFERENCE PAPERS, 2023,
  • [44] Speech4Mesh: Speech-Assisted Monocular 3D Facial Reconstruction for Speech-Driven 3D Facial Animation
    He, Shan
    He, Haonan
    Yang, Shuo
    Wu, Xiaoyan
    Xia, Pengcheng
    Yin, Bing
    Liu, Cong
    Dai, Lirong
    Xu, Chang
    2023 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2023), 2023, : 14146 - 14156
  • [45] Real-time speech-driven animation of expressive talking faces
    Liu, Jia
    You, Mingyu
    Chen, Chun
    Song, Mingli
    INTERNATIONAL JOURNAL OF GENERAL SYSTEMS, 2011, 40 (04) : 439 - 455
  • [46] HMM BASED SPEECH-DRIVEN 3D TONGUE ANIMATION
    Luo, Changwei
    Yu, Jun
    Li, Xian
    Zhang, Leilei
    2017 24TH IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING (ICIP), 2017, : 4377 - 4381
  • [47] A Research on Facial Animation Driven by Emotional Speech
    Lixiang, Li
    ICCSSE 2009: PROCEEDINGS OF 2009 4TH INTERNATIONAL CONFERENCE ON COMPUTER SCIENCE & EDUCATION, 2009, : 118 - 121
  • [48] Language Independent Speech Driven Facial Animation
    Singh, Archana
    Jotwani, Naresh D.
    IEEE REGION 10 COLLOQUIUM AND THIRD INTERNATIONAL CONFERENCE ON INDUSTRIAL AND INFORMATION SYSTEMS, VOLS 1 AND 2, 2008, : 613 - 618
  • [49] VividWav2Lip: High-Fidelity Facial Animation Generation Based on Speech-Driven Lip Synchronization
    Liu, Li
    Wang, Jinhui
    Chen, Shijuan
    Li, Zongmei
    ELECTRONICS, 2024, 13 (18)
  • [50] Speech-Driven Facial Animations Improve Speech-in-Noise Comprehension of Humans
    Varano, Enrico
    Vougioukas, Konstantinos
    Ma, Pingchuan
    Petridis, Stavros
    Pantic, Maja
    Reichenbach, Tobias
    FRONTIERS IN NEUROSCIENCE, 2022, 15