Live Speech Driven Head-and-Eye Motion Generators

被引：50

作者：

Le, Binh H. ^{[1
]}

Ma, Xiaohan ^{[1
]}

Deng, Zhigang

机构：

[1] Univ Houston, Dept Comp Sci, Comp Graph Lab, Houston, TX 77204 USA

来源：

IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS | 2012年 / 18卷 / 11期

基金：

美国国家科学基金会;

关键词：

Facial animation; head and eye motion coupling; head motion synthesis; gaze synthesis; blinking model; live speech driven; ANIMATION; CAPTURE; MODEL; GAZE; PATTERNS; PROSODY; FACES;

D O I：

10.1109/TVCG.2012.74

中图分类号：

TP31 [计算机软件];

学科分类号：

081202 ; 0835 ;

摘要：

This paper describes a fully automated framework to generate realistic head motion, eye gaze, and eyelid motion simultaneously based on live (or recorded) speech input. Its central idea is to learn separate yet interrelated statistical models for each component (head motion, gaze, or eyelid motion) from a prerecorded facial motion data set: 1) Gaussian Mixture Models and gradient descent optimization algorithm are employed to generate head motion from speech features; 2) Nonlinear Dynamic Canonical Correlation Analysis model is used to synthesize eye gaze from head motion and speech features, and 3) nonnegative linear regression is used to model voluntary eye lid motion and log-normal distribution is used to describe involuntary eye blinks. Several user studies are conducted to evaluate the effectiveness of the proposed speech-driven head and eye motion generator using the well-established paired comparison methodology. Our evaluation results clearly show that this approach can significantly outperform the state-of-the-art head and eye motion generation algorithms. In addition, a novel mocap+video hybrid data acquisition technique is introduced to record high-fidelity head movement, eye gaze, and eyelid motion simultaneously.

引用

页码：1902 / 1914

页数：13

共 50 条

[31] Speech driven 3D head gesture synthesis
Sargin, M. E.
Erzin, E.
Yemez, Y.
Tekalp, A. M.
Erdem, A. Tanju
2006 IEEE 14TH SIGNAL PROCESSING AND COMMUNICATIONS APPLICATIONS, VOLS 1 AND 2, 2006, : 237 - +
[32] SPEECH DRIVEN TALKING HEAD FROM ESTIMATED ARTICULATORY FEATURES
Ben-Youssef, Atef
Shimodaira, Hiroshi
Braude, David A.
2014 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2014,
[33] Stable eye versus mouth preference in a live speech-processing task
Viktorsson, Charlotte
Valtakari, Niilo V.
Falck-Ytter, Terje
Hooge, Ignace T. C.
Rudling, Maja
Hessels, Roy S.
SCIENTIFIC REPORTS, 2023, 13 (01)
[34] Stable eye versus mouth preference in a live speech-processing task
Charlotte Viktorsson
Niilo V. Valtakari
Terje Falck-Ytter
Ignace T. C. Hooge
Maja Rudling
Roy S. Hessels
Scientific Reports, 13
[35] Speech-driven Lip Motion Generation with a Trajectory HMM
Hofer, Gregor
Yamagishi, Junichi
Shimodaira, Hiroshi
INTERSPEECH 2008: 9TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2008, VOLS 1-5, 2008, : 2314 - 2317
[36] Speech, Head, and Eye-based Cues for Continuous Affect Prediction
O'Dwyer, Jonny
2019 8TH INTERNATIONAL CONFERENCE ON AFFECTIVE COMPUTING AND INTELLIGENT INTERACTION WORKSHOPS AND DEMOS (ACIIW), 2019, : 16 - 20
[37] Natural head motion synthesis driven by acoustic prosodic features
Busso, C
Deng, ZG
Neumann, U
COMPUTER ANIMATION AND VIRTUAL WORLDS, 2005, 16 (3-4) : 283 - 290
[38] Head motion synthesis from speech using deep neural networks
Chuang Ding
Lei Xie
Pengcheng Zhu
Multimedia Tools and Applications, 2015, 74 : 9871 - 9888
[39] Analysis of relationship between head motion events and speech in dialogue conversations
Ishi, Carlos Toshinori
Ishiguro, Hiroshi
Hagita, Norihiro
SPEECH COMMUNICATION, 2014, 57 : 233 - 243
[40] An Embodied Entrainment Character Cell Phone by Speech and Head Motion Inputs
Yamamoto, Michiya
Osaki, Kouzi
Matsune, Shotaro
Watanabe, Tomio
2010 IEEE RO-MAN, 2010, : 298 - 303

← 1 2 3 4 5 →