Text-to-visual speech synthesis based on parameter generation from HMM

被引:0
|
作者
Masuko, T [1 ]
Kobayashi, T [1 ]
Tamura, M [1 ]
Masubuchi, J [1 ]
Tokuda, K [1 ]
机构
[1] Tokyo Inst Technol, Precis & Intelligence Lab, Yokohama, Kanagawa 2268503, Japan
关键词
D O I
暂无
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
This paper presents a new technique for synthesizing visual speech from arbitrarily given text. The technique is based on an algorithm for parameter generation from HMM with dynamic features, which has been successfully applied to text-to-speech synthesis. In the training phase, syllable HMMs are trained with visual speech parameter sequences that represent lip movements. In the synthesis phase, a sentence HMM is constructed by concatenating syllable HMMs corresponding to the phonetic transcription for the input text. Then an optimum visual speech parameter sequence is generated from the sentence HMM in ML sense. The proposed technique can generate synchronized lip movements with speech in a unified framework. Furthermore, coarticulation is implicitly incorporated into generated mouth shapes. As a result, synthetic lip motion becomes smooth and realistic.
引用
收藏
页码:3745 / 3748
页数:4
相关论文
共 50 条
  • [41] Subjective analysis of an HMM-based visual speech synthesizer
    Williams, JJ
    Katsaggelos, AK
    Garstecki, DC
    HUMAN VISION AND ELECTRONIC IMAGING VI, 2001, 4299 : 544 - 555
  • [42] DEMONSTRATION OF AN HMM-BASED PHOTOREALISTIC EXPRESSIVE AUDIO-VISUAL SPEECH SYNTHESIS SYSTEM
    Filntisis, Panagiotis Paraskevas
    Katsamanis, Athanasios
    Maragos, Petros
    2017 24TH IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING (ICIP), 2017, : 4588 - 4588
  • [43] Normalized training for HMM-based visual speech recognition
    Nankaku, Yoshihiko
    Tokuda, Keiichi
    Kitamura, Tadashi
    Kobayashi, Takao
    ELECTRONICS AND COMMUNICATIONS IN JAPAN PART III-FUNDAMENTAL ELECTRONIC SCIENCE, 2006, 89 (11): : 40 - 50
  • [44] Normalized training for HMM-based visual speech recognition
    Nankaku, Y
    Tokuda, K
    Kitamura, T
    Kobayashi, T
    2000 INTERNATIONAL CONFERENCE ON IMAGE PROCESSING, VOL III, PROCEEDINGS, 2000, : 234 - 237
  • [45] HMM-based distributed text-to-speech synthesis incorporating speaker-adaptive training
    Jeon, Kwang Myung
    Choi, Seung Ho
    International Journal of Multimedia and Ubiquitous Engineering, 2014, 9 (05): : 107 - 119
  • [46] A Novel Text-to-Speech Synthesis System Using Syllable-Based HMM for Tamil Language
    Manoharan, J. Samuel
    PROCEEDINGS OF SECOND INTERNATIONAL CONFERENCE ON SUSTAINABLE EXPERT SYSTEMS (ICSES 2021), 2022, 351 : 305 - 314
  • [47] Minimum generation error linear regression based model adaptation for HMM-based speech synthesis
    Qin, Long
    Wu, Yi-Jian
    Ling, Zhen-Hua
    Wang, Ren-Hua
    Da, Li-Rong
    2008 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING, VOLS 1-12, 2008, : 3953 - +
  • [48] Speaker-adaptive visual speech synthesis in the HMM-framework
    Schabus, Dietmar
    Pucher, Michael
    Hofer, Gregor
    13TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2012 (INTERSPEECH 2012), VOLS 1-3, 2012, : 978 - 981
  • [49] Synthesis of stressed speech from isolated neutral speech using HMM-based models
    BouGhazale, SE
    Hansen, JHL
    ICSLP 96 - FOURTH INTERNATIONAL CONFERENCE ON SPOKEN LANGUAGE PROCESSING, PROCEEDINGS, VOLS 1-4, 1996, : 1860 - 1863
  • [50] An HMM-based Mandarin Chinese Text-to-Speech system
    Qian, Yao
    Soong, Frank
    Chen, Yining
    Chu, Min
    CHINESE SPOKEN LANGUAGE PROCESSING, PROCEEDINGS, 2006, 4274 : 223 - +