A comparison of acoustic coding models for speech-driven facial animation

被引:12
|
作者
Kakumanu, Praveen
Esposito, Anna
Garcia, Oscar N.
Gutierrez-Osuna, Ricardo [1 ]
机构
[1] Texas A&M Univ, Dept Comp Sci, College Stn, TX 77843 USA
[2] Wright State Univ, Dept Comp Sci & Engn, Dayton, OH 45435 USA
[3] Univ Naples 2, Dept Psychol, Naples, Italy
[4] Univ N Texas, Coll Engn, Denton, TX 76203 USA
[5] Texas A&M Univ, Dept Comp Sci, College Stn, TX 77843 USA
关键词
speech-driven facial animation; audio-visual mapping; linear discriminants analysis;
D O I
10.1016/j.specom.2005.09.005
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
This article presents a thorough experimental comparison of several acoustic modeling techniques by their ability to capture information related to orofacial motion. These models include (1) Linear Predictive Coding and Linear Spectral Frequencies, which model the dynamics of the speech production system, (2) Mel Frequency Cepstral Coefficients and Perceptual Critical Feature Bands, which encode perceptual cues of speech, (3) spectral energy and fundamental frequency, which capture prosodic aspects, and (4) two hybrid methods that combine information from the previous models. We also consider a novel supervised procedure based on Fisher's Linear Discriminants to project acoustic information onto a low-dimensional subspace that best discriminates different orofacial configurations. Prediction of orofacial motion from speech acoustics is performed using a non-parametric k-nearest-neighbors procedure. The sensitivity of this audio-visual mapping to coarticulation effects and spatial locality is thoroughly investigated. Our results indicate that the hybrid use of articulatory, perceptual and prosodic features of speech, combined with a supervised dimensionality-reduction procedure, is able to outperform any individual acoustic model for speech-driven facial animation. These results are validated on the 450 sentences of the TIMIT compact dataset. (C) 2005 Elsevier B.V. All rights reserved.
引用
收藏
页码:598 / 615
页数:18
相关论文
共 50 条
  • [1] Expressive speech-driven facial animation
    Cao, Y
    Tien, WC
    Faloutsos, P
    Pighin, F
    ACM TRANSACTIONS ON GRAPHICS, 2005, 24 (04): : 1283 - 1302
  • [2] Speech-driven facial animation with realistic dynamics
    Gutierrez-Osuna, R
    Kakumanu, PK
    Esposito, A
    Garcia, ON
    Bojorquez, A
    Castillo, JL
    Rudomin, I
    IEEE TRANSACTIONS ON MULTIMEDIA, 2005, 7 (01) : 33 - 42
  • [3] Realistic Speech-Driven Facial Animation with GANs
    Konstantinos Vougioukas
    Stavros Petridis
    Maja Pantic
    International Journal of Computer Vision, 2020, 128 : 1398 - 1413
  • [4] Realistic Speech-Driven Facial Animation with GANs
    Vougioukas, Konstantinos
    Petridis, Stavros
    Pantic, Maja
    INTERNATIONAL JOURNAL OF COMPUTER VISION, 2020, 128 (05) : 1398 - 1413
  • [5] REALTIME SPEECH-DRIVEN FACIAL ANIMATION USING GAUSSIAN MIXTURE MODELS
    Luo, Changwei
    Yu, Jun
    Li, Xian
    Wang, Zengfu
    2014 IEEE INTERNATIONAL CONFERENCE ON MULTIMEDIA AND EXPO WORKSHOPS (ICMEW), 2014,
  • [6] Speech-driven facial animation using a hierarchical model
    Cosker, DP
    Marshall, AD
    Rosin, PL
    Hicks, YA
    IEE PROCEEDINGS-VISION IMAGE AND SIGNAL PROCESSING, 2004, 151 (04): : 314 - 321
  • [7] Towards Realistic Real Time Speech-Driven Facial Animation
    Cerekovic, Aleksandra
    Zoric, Goranka
    Smid, Karlo
    Pandzic, Igor S.
    INTELLIGENT VIRTUAL AGENTS, PROCEEDINGS, 2008, 5208 : 476 - 478
  • [8] Speech-driven facial animation with spectral gathering and temporal attention
    Yujin Chai
    Yanlin Weng
    Lvdi Wang
    Kun Zhou
    Frontiers of Computer Science, 2022, 16
  • [9] Speech-driven facial animation with spectral gathering and temporal attention
    Chai, Yujin
    Weng, Yanlin
    Wang, Lvdi
    Zhou, Kun
    FRONTIERS OF COMPUTER SCIENCE, 2022, 16 (03)
  • [10] Speech-Driven Facial Animation Using Manifold Relevance Determination
    Dawood, Samia
    Hicks, Yulia
    Marshall, David
    COMPUTER VISION - ECCV 2016 WORKSHOPS, PT II, 2016, 9914 : 869 - 882