Live Speech Driven Head-and-Eye Motion Generators

被引:50
|
作者
Le, Binh H. [1 ]
Ma, Xiaohan [1 ]
Deng, Zhigang
机构
[1] Univ Houston, Dept Comp Sci, Comp Graph Lab, Houston, TX 77204 USA
基金
美国国家科学基金会;
关键词
Facial animation; head and eye motion coupling; head motion synthesis; gaze synthesis; blinking model; live speech driven; ANIMATION; CAPTURE; MODEL; GAZE; PATTERNS; PROSODY; FACES;
D O I
10.1109/TVCG.2012.74
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
This paper describes a fully automated framework to generate realistic head motion, eye gaze, and eyelid motion simultaneously based on live (or recorded) speech input. Its central idea is to learn separate yet interrelated statistical models for each component (head motion, gaze, or eyelid motion) from a prerecorded facial motion data set: 1) Gaussian Mixture Models and gradient descent optimization algorithm are employed to generate head motion from speech features; 2) Nonlinear Dynamic Canonical Correlation Analysis model is used to synthesize eye gaze from head motion and speech features, and 3) nonnegative linear regression is used to model voluntary eye lid motion and log-normal distribution is used to describe involuntary eye blinks. Several user studies are conducted to evaluate the effectiveness of the proposed speech-driven head and eye motion generator using the well-established paired comparison methodology. Our evaluation results clearly show that this approach can significantly outperform the state-of-the-art head and eye motion generation algorithms. In addition, a novel mocap+video hybrid data acquisition technique is introduced to record high-fidelity head movement, eye gaze, and eyelid motion simultaneously.
引用
收藏
页码:1902 / 1914
页数:13
相关论文
共 50 条
  • [31] Speech driven 3D head gesture synthesis
    Sargin, M. E.
    Erzin, E.
    Yemez, Y.
    Tekalp, A. M.
    Erdem, A. Tanju
    2006 IEEE 14TH SIGNAL PROCESSING AND COMMUNICATIONS APPLICATIONS, VOLS 1 AND 2, 2006, : 237 - +
  • [32] SPEECH DRIVEN TALKING HEAD FROM ESTIMATED ARTICULATORY FEATURES
    Ben-Youssef, Atef
    Shimodaira, Hiroshi
    Braude, David A.
    2014 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2014,
  • [33] Stable eye versus mouth preference in a live speech-processing task
    Viktorsson, Charlotte
    Valtakari, Niilo V.
    Falck-Ytter, Terje
    Hooge, Ignace T. C.
    Rudling, Maja
    Hessels, Roy S.
    SCIENTIFIC REPORTS, 2023, 13 (01)
  • [34] Stable eye versus mouth preference in a live speech-processing task
    Charlotte Viktorsson
    Niilo V. Valtakari
    Terje Falck-Ytter
    Ignace T. C. Hooge
    Maja Rudling
    Roy S. Hessels
    Scientific Reports, 13
  • [35] Speech-driven Lip Motion Generation with a Trajectory HMM
    Hofer, Gregor
    Yamagishi, Junichi
    Shimodaira, Hiroshi
    INTERSPEECH 2008: 9TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2008, VOLS 1-5, 2008, : 2314 - 2317
  • [36] Speech, Head, and Eye-based Cues for Continuous Affect Prediction
    O'Dwyer, Jonny
    2019 8TH INTERNATIONAL CONFERENCE ON AFFECTIVE COMPUTING AND INTELLIGENT INTERACTION WORKSHOPS AND DEMOS (ACIIW), 2019, : 16 - 20
  • [37] Natural head motion synthesis driven by acoustic prosodic features
    Busso, C
    Deng, ZG
    Neumann, U
    COMPUTER ANIMATION AND VIRTUAL WORLDS, 2005, 16 (3-4) : 283 - 290
  • [38] Head motion synthesis from speech using deep neural networks
    Chuang Ding
    Lei Xie
    Pengcheng Zhu
    Multimedia Tools and Applications, 2015, 74 : 9871 - 9888
  • [39] Analysis of relationship between head motion events and speech in dialogue conversations
    Ishi, Carlos Toshinori
    Ishiguro, Hiroshi
    Hagita, Norihiro
    SPEECH COMMUNICATION, 2014, 57 : 233 - 243
  • [40] An Embodied Entrainment Character Cell Phone by Speech and Head Motion Inputs
    Yamamoto, Michiya
    Osaki, Kouzi
    Matsune, Shotaro
    Watanabe, Tomio
    2010 IEEE RO-MAN, 2010, : 298 - 303