Text-to-visual speech synthesis based on parameter generation from HMM

被引:0
|
作者
Masuko, T [1 ]
Kobayashi, T [1 ]
Tamura, M [1 ]
Masubuchi, J [1 ]
Tokuda, K [1 ]
机构
[1] Tokyo Inst Technol, Precis & Intelligence Lab, Yokohama, Kanagawa 2268503, Japan
关键词
D O I
暂无
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
This paper presents a new technique for synthesizing visual speech from arbitrarily given text. The technique is based on an algorithm for parameter generation from HMM with dynamic features, which has been successfully applied to text-to-speech synthesis. In the training phase, syllable HMMs are trained with visual speech parameter sequences that represent lip movements. In the synthesis phase, a sentence HMM is constructed by concatenating syllable HMMs corresponding to the phonetic transcription for the input text. Then an optimum visual speech parameter sequence is generated from the sentence HMM in ML sense. The proposed technique can generate synchronized lip movements with speech in a unified framework. Furthermore, coarticulation is implicitly incorporated into generated mouth shapes. As a result, synthetic lip motion becomes smooth and realistic.
引用
收藏
页码:3745 / 3748
页数:4
相关论文
共 50 条
  • [1] Text-to-Visual Speech Synthesis
    Bournemouth University, Fern Barrow, Poole, BH12 5BB, United Kingdom
    Inf, 4 (445-450):
  • [2] Text-to-visual speech synthesis for general objects using parameter-based lip models
    Chuang, ZJ
    Wu, CH
    ADVANCES IN MULTIMEDIA INFORMATION PROCESSING - PCM 2002, PROCEEDING, 2002, 2532 : 589 - 597
  • [3] Speech parameter generation algorithms for HMM-based speech synthesis
    Tokuda, K
    Yoshimura, T
    Masuko, T
    Kobayashi, T
    Kitamura, T
    2000 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, PROCEEDINGS, VOLS I-VI, 2000, : 1315 - 1318
  • [4] The Kinematical Analysis of Virtual Robot in Text-to-Visual Speech Synthesis
    Yang, Zhixiao
    Han, Jinchi
    Ding, Mi
    2009 IEEE INTERNATIONAL CONFERENCE ON VIRTUAL ENVIRONMENTS, HUMAN-COMPUTER INTERFACES AND MEASUREMENT SYSTEMS, 2009, : 159 - 163
  • [5] Evaluating Text-to-Visual Generation with Image-to-Text Generation
    Lin, Zhiqiu
    Athaki, Deepak
    Li, Baiqi
    Li, Jiayao
    Xia, Xide
    Neubig, Graham
    Zhang, Pengchuan
    Ramanan, Deva
    COMPUTER VISION - ECCV 2024, PT IX, 2025, 15067 : 366 - 384
  • [6] A 3D Communication Platform based on Text-to-Visual Speech Sythesis
    Yang Zhixiao
    Sui Fei
    Zhang Dexian
    2009 IEEE INTERNATIONAL CONFERENCE ON VIRTUAL ENVIRONMENTS, HUMAN-COMPUTER INTERFACES AND MEASUREMENT SYSTEMS, 2009, : 22 - 26
  • [7] A speech parameter generation algorithm considering global variance for HMM-based speech synthesis
    Toda, Tomoki
    Tokuda, Keiichi
    IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, 2007, E90D (05): : 816 - 824
  • [8] SPEECH PARAMETER GENERATION CONSIDERING LSP ORDERING PROPERTY FOR HMM-BASED SPEECH SYNTHESIS
    Qian, Shijun
    Wang, Huanliang
    Pei, Wenjiang
    Zou, Ping
    Wang, Kai
    2012 PROCEEDINGS OF THE 20TH EUROPEAN SIGNAL PROCESSING CONFERENCE (EUSIPCO), 2012, : 330 - 334
  • [9] A speech parameter generation algorithm using local variance for HMM-based speech synthesis
    Chunwijitra, Vataya
    Nose, Takashi
    Kobayashi, Takao
    13TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2012 (INTERSPEECH 2012), VOLS 1-3, 2012, : 1150 - 1153
  • [10] Parameter Generation Considering LSP Ordering Property for HMM-Based Speech Synthesis
    Qian, Shijun
    Wang, Huanliang
    Pei, Wenjiang
    Wang, Kai
    IEEE SIGNAL PROCESSING LETTERS, 2012, 19 (08) : 467 - 470