Text-to-visual speech synthesis based on parameter generation from HMM

被引：0

作者：

Masuko, T ^{[1
]}

Kobayashi, T ^{[1
]}

Tamura, M ^{[1
]}

Masubuchi, J ^{[1
]}

Tokuda, K ^{[1
]}

机构：

[1] Tokyo Inst Technol, Precis & Intelligence Lab, Yokohama, Kanagawa 2268503, Japan

来源：

PROCEEDINGS OF THE 1998 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING, VOLS 1-6 | 1998年

关键词：

D O I：

暂无

中图分类号：

O42 [声学];

学科分类号：

070206 ; 082403 ;

摘要：

This paper presents a new technique for synthesizing visual speech from arbitrarily given text. The technique is based on an algorithm for parameter generation from HMM with dynamic features, which has been successfully applied to text-to-speech synthesis. In the training phase, syllable HMMs are trained with visual speech parameter sequences that represent lip movements. In the synthesis phase, a sentence HMM is constructed by concatenating syllable HMMs corresponding to the phonetic transcription for the input text. Then an optimum visual speech parameter sequence is generated from the sentence HMM in ML sense. The proposed technique can generate synchronized lip movements with speech in a unified framework. Furthermore, coarticulation is implicitly incorporated into generated mouth shapes. As a result, synthetic lip motion becomes smooth and realistic.

引用

页码：3745 / 3748

页数：4

共 50 条

[1] Text-to-Visual Speech Synthesis
Bournemouth University, Fern Barrow, Poole, BH12 5BB, United Kingdom
Inf, 4 (445-450):
[2] Text-to-visual speech synthesis for general objects using parameter-based lip models
Chuang, ZJ
Wu, CH
ADVANCES IN MULTIMEDIA INFORMATION PROCESSING - PCM 2002, PROCEEDING, 2002, 2532 : 589 - 597
[3] Speech parameter generation algorithms for HMM-based speech synthesis
Tokuda, K
Yoshimura, T
Masuko, T
Kobayashi, T
Kitamura, T
2000 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, PROCEEDINGS, VOLS I-VI, 2000, : 1315 - 1318
[4] The Kinematical Analysis of Virtual Robot in Text-to-Visual Speech Synthesis
Yang, Zhixiao
Han, Jinchi
Ding, Mi
2009 IEEE INTERNATIONAL CONFERENCE ON VIRTUAL ENVIRONMENTS, HUMAN-COMPUTER INTERFACES AND MEASUREMENT SYSTEMS, 2009, : 159 - 163
[5] Evaluating Text-to-Visual Generation with Image-to-Text Generation
Lin, Zhiqiu
Athaki, Deepak
Li, Baiqi
Li, Jiayao
Xia, Xide
Neubig, Graham
Zhang, Pengchuan
Ramanan, Deva
COMPUTER VISION - ECCV 2024, PT IX, 2025, 15067 : 366 - 384
[6] A 3D Communication Platform based on Text-to-Visual Speech Sythesis
Yang Zhixiao
Sui Fei
Zhang Dexian
2009 IEEE INTERNATIONAL CONFERENCE ON VIRTUAL ENVIRONMENTS, HUMAN-COMPUTER INTERFACES AND MEASUREMENT SYSTEMS, 2009, : 22 - 26
[7] A speech parameter generation algorithm considering global variance for HMM-based speech synthesis
Toda, Tomoki
Tokuda, Keiichi
IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, 2007, E90D (05): : 816 - 824
[8] SPEECH PARAMETER GENERATION CONSIDERING LSP ORDERING PROPERTY FOR HMM-BASED SPEECH SYNTHESIS
Qian, Shijun
Wang, Huanliang
Pei, Wenjiang
Zou, Ping
Wang, Kai
2012 PROCEEDINGS OF THE 20TH EUROPEAN SIGNAL PROCESSING CONFERENCE (EUSIPCO), 2012, : 330 - 334
[9] A speech parameter generation algorithm using local variance for HMM-based speech synthesis
Chunwijitra, Vataya
Nose, Takashi
Kobayashi, Takao
13TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2012 (INTERSPEECH 2012), VOLS 1-3, 2012, : 1150 - 1153
[10] Parameter Generation Considering LSP Ordering Property for HMM-Based Speech Synthesis
Qian, Shijun
Wang, Huanliang
Pei, Wenjiang
Wang, Kai
IEEE SIGNAL PROCESSING LETTERS, 2012, 19 (08) : 467 - 470

← 1 2 3 4 5 →