Accurate visible speech synthesis based on concatenating variable length motion capture data

被引:23
|
作者
Ma, JY [1 ]
Cole, R [1 ]
Pellom, B [1 ]
Ward, W [1 ]
Wise, B [1 ]
机构
[1] Univ Colorado, Ctr Spoken Language Res, Boulder, CO 80309 USA
关键词
face animation; character animation; visual speech; visible speech; coarticulation effect; virtual human;
D O I
10.1109/TVCG.2006.18
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
We present a novel approach to synthesizing accurate visible speech based on searching and concatenating optimal variable- length units in a large corpus of motion capture data. Based on a set of visual prototypes selected on a source face and a corresponding set designated for a target face, we propose a machine learning technique to automatically map the facial motions observed on the source face to the target face. In order to model the long distance coarticulation effects in visible speech, a large- scale corpus that covers the most common syllables in English was collected, annotated and analyzed. For any input text, a search algorithm to locate the optimal sequences of concatenated units for synthesis is desrcribed. A new algorithm to adapt lip motions from a generic 3D face model to a specific 3D face model is also proposed. A complete, end- to- end visible speech animation system is implemented based on the approach. This system is currently used in more than 60 kindergarten through third grade classrooms to teach students to read using a lifelike conversational animated agent. To evaluate the quality of the visible speech produced by the animation system, both subjective evaluation and objective evaluation are conducted. The evaluation results show that the proposed approach is accurate and powerful for visible speech synthesis.
引用
收藏
页码:266 / 276
页数:11
相关论文
共 50 条
  • [1] Accurate automatic visible speech synthesis of arbitrary 3D models based on concatenation of diviseme motion capture data
    Ma, JY
    Cole, R
    Pellom, B
    Ward, W
    Wise, B
    [J]. COMPUTER ANIMATION AND VIRTUAL WORLDS, 2004, 15 (05) : 485 - 500
  • [2] Analysis of Facial Motion Capture Data for Visual Speech Synthesis
    Zelezny, Milos
    Krnoul, Zdenek
    Jedlicka, Pavel
    [J]. SPEECH AND COMPUTER (SPECOM 2015), 2015, 9319 : 81 - 88
  • [3] Attitude Algorithm and Calculation of Limb Length Based on Motion Capture Data
    Wang, Nianfeng
    Huang, Jiegang
    Yue, Fan
    Zhang, Xianmin
    [J]. 2021 IEEE INTERNATIONAL CONFERENCE ON ROBOTICS AND BIOMIMETICS (IEEE-ROBIO 2021), 2021, : 1004 - 1009
  • [4] Motion Extension Based on Motion Capture Data
    Qu, Shi
    Li, Hong
    Cai, Yichao
    Chen, Zhongkuan
    Zhao, Xinshuang
    [J]. INTERNATIONAL CONFERENCE ON ELECTRICAL, CONTROL AND AUTOMATION (ICECA 2014), 2014, : 756 - 761
  • [5] Variable-length data access scheme with capture effect
    Kikuchi, K
    Shimamoto, S
    [J]. ELECTRONICS AND COMMUNICATIONS IN JAPAN PART I-COMMUNICATIONS, 1999, 82 (09): : 19 - 29
  • [6] Measuring motion capture data quality for data driven human motion synthesis
    Manns, Martin
    Otto, Michael
    Mauer, Markus
    [J]. RESEARCH AND INNOVATION IN MANUFACTURING: KEY ENABLING TECHNOLOGIES FOR THE FACTORIES OF THE FUTURE - PROCEEDINGS OF THE 48TH CIRP CONFERENCE ON MANUFACTURING SYSTEMS, 2016, 41 : 945 - 950
  • [7] Early or synchronized gestures facilitate speech recall-a study based on motion capture data
    Nirme, Jens
    Gulz, Agneta
    Haake, Magnus
    Gullberg, Marianne
    [J]. FRONTIERS IN PSYCHOLOGY, 2024, 15
  • [8] Accurate synthesis of dysarthric Speech for ASR data augmentation
    Soleymanpour, Mohammad
    Johnson, Michael T.
    Soleymanpour, Rahim
    Berry, Jeffrey
    [J]. SPEECH COMMUNICATION, 2024, 164
  • [9] A new motion editing algorithm based on motion capture data
    Chen, ZH
    Ma, LZ
    Wu, XM
    Gao, Y
    [J]. PROCEEDINGS OF THE 8TH JOINT CONFERENCE ON INFORMATION SCIENCES, VOLS 1-3, 2005, : 375 - 378
  • [10] A Dance Synthesis System Using Motion Capture Data
    Takahashi, Kenichi
    Ueda, Hiroaki
    [J]. KNOWLEDGE ACQUISITION: APPROACHES, ALGORITHMS AND APPLICATIONS, 2009, 5465 : 208 - 217