Accurate automatic visible speech synthesis of arbitrary 3D models based on concatenation of diviseme motion capture data

被引:17
|
作者
Ma, JY [1 ]
Cole, R [1 ]
Pellom, B [1 ]
Ward, W [1 ]
Wise, B [1 ]
机构
[1] Univ Colorado, Ctr Spoken Language Res, Boulder, CO 80309 USA
关键词
visible speech; visual speech synthesis; animated speech; coarticulation modelling; speech animation; face animation;
D O I
10.1002/cav.11
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
We present a technique for accurate automatic visible speech synthesis from textual input. When provided with a speech waveform and the text of a spoken sentence, the system produces accurate visible speech synchronized with the audio signal. To develop the system, we collected motion capture data from a speaker's face during production of a set of words containing all diviseme sequences in English. The motion capture points from the speaker's face are retargeted to the vertices of the polygons of a 31) face model. When synthesizing a new utterance, the system locates the required sequence of divisemes, shrinks or expands each diviseme based on the desired phoneme segment durations in the target utterance, then moves the polygons in the regions of the lips and lower face to correspond to the spatial coordinates of the motion capture data. The motion mapping is realized by a key-shape mapping function learned by a set of viseme examples in the source and target faces. A well-posed numerical algorithm estimates the shape blending coefficients. Time warping and motion vector blending at the juncture of two divisemes and the algorithm to search the optimal concatenated visible speech are also developed to provide the final concatenative motion sequence. Copyright (C) 2004 John Wiley Sons, Ltd.
引用
收藏
页码:485 / 500
页数:16
相关论文
共 50 条
  • [1] Accurate Visual Speech Synthesis Based on Diviseme Unit Selection and Concatenation
    Jiang, Dongmei
    Ravyse, Ilse
    Sahli, Hichem
    Zhang, Yanning
    [J]. 2008 IEEE 10TH WORKSHOP ON MULTIMEDIA SIGNAL PROCESSING, VOLS 1 AND 2, 2008, : 910 - +
  • [2] Accurate visible speech synthesis based on concatenating variable length motion capture data
    Ma, JY
    Cole, R
    Pellom, B
    Ward, W
    Wise, B
    [J]. IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS, 2006, 12 (02) : 266 - 276
  • [3] Automatic motion synthesis for 3D mass-spring models
    Jon Christensen
    Joe Marks
    J. Thomas Ngo
    [J]. The Visual Computer, 1997, 13 : 20 - 28
  • [4] Automatic 3D Motion Capture of Swimming: Marker Resistance
    Kjendlie, Per-Ludvik
    Olstad, Bjorn Harald
    [J]. MEDICINE AND SCIENCE IN SPORTS AND EXERCISE, 2012, 44 : 476 - 476
  • [5] Automatic motion synthesis for 3D mass-spring models.
    Christensen, J
    Marks, J
    Ngo, JT
    [J]. VISUAL COMPUTER, 1997, 13 (01): : 20 - 28
  • [6] Data processing method of 3D motion capture based on bone constraint
    Dong, Pengyue
    Zhang, Yu
    Zhang, Zihao
    [J]. PROCEEDINGS OF THE 33RD CHINESE CONTROL AND DECISION CONFERENCE (CCDC 2021), 2021, : 833 - 838
  • [7] Facial Motion Capture with 3D Active Appearance Models
    Darujati, Cahyo
    Hariadi, Mochammad
    [J]. PROCEEDINGS OF 2013 3RD INTERNATIONAL CONFERENCE ON INSTRUMENTATION, COMMUNICATIONS, INFORMATION TECHNOLOGY, AND BIOMEDICAL ENGINEERING (ICICI-BME), 2013, : 59 - 64
  • [8] Keypose synthesis from 3D motion capture data by using evolutionary clustering
    Gunen, Mehmet Akif
    Besdok, Pinar Civicioglu
    Besdok, Erkan
    [J]. CONCURRENCY AND COMPUTATION-PRACTICE & EXPERIENCE, 2022, 34 (01):
  • [9] Automatic motion synthesis with music factors based on 3D spring model
    Wang, Q
    Saiwaki, N
    Nishida, S
    [J]. IEEE RO-MAN 2000: 9TH IEEE INTERNATIONAL WORKSHOP ON ROBOT AND HUMAN INTERACTIVE COMMUNICATION, PROCEEDINGS, 2000, : 184 - 189
  • [10] Human motion capture using 3D reconstruction based on multiple depth data
    Filali, Wassim
    Masse, Jean-Thomas
    Lerasle, Frederic
    Boizard, Jean-Louis
    Devy, Michel
    [J]. 2013 IEEE INTERNATIONAL CONFERENCE ON SYSTEMS, MAN, AND CYBERNETICS (SMC 2013), 2013, : 870 - 875