A high quality text-to-speech system composed of multiple neural networks

被引：0

作者：

Karaali, O ^{[1
]}

Corrigan, G ^{[1
]}

Massey, N ^{[1
]}

Miller, C ^{[1
]}

Schnurr, O ^{[1
]}

Mackie, A ^{[1
]}

机构：

[1] Motorola Inc, Speech Proc Lab, Schaumburg, IL 60196 USA

来源：

PROCEEDINGS OF THE 1998 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING, VOLS 1-6 | 1998年

关键词：

D O I：

暂无

中图分类号：

O42 [声学];

学科分类号：

070206 ; 082403 ;

摘要：

While neural networks have been employed to handle several different text-to-speech tasks, ours is the first system to use neural networks throughout, for both linguistic and acoustic processing. We divide the text-to-speech task into three subtasks, a linguistic module mapping from text to a linguistic representation, an acoustic module mapping from the linguistic representation to speech, and a video module mapping from the linguistic representation to animated images. The linguistic module employs a letter-to-sound neural network and a postlexical neural network. The acoustic module employs a duration neural network and a phonetic neural network. The visual neural network is employed in parallel to the acoustic module to drive a talking head. The use of neural networks that can be retrained on the characteristics of different voices and languages affords our system a degree of adaptability and naturalness heretofore unavailable.

引用

页码：1237 / 1240

页数：4

共 50 条

[41] Dealing with prosody in a text-to-speech system
Goldsmith, John
International Journal of Speech Technology, 1999, 3 (01): : 51 - 63
[42] EXPERIMENTAL TEXT-TO-SPEECH SYSTEM FOR HANDICAPPED
CARLSON, R
GRANSTROM, B
JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA, 1978, 64 : S163 - S163
[43] Dealing with prosody in a text-to-speech system
Goldsmith J.
International Journal of Speech Technology, 1999, 3 (1) : 51 - 63
[44] High quality Arabic text-to-speech synthesis using unit selection
Abdelmalek, Raja
Mnasri, Zied
2016 13TH INTERNATIONAL MULTI-CONFERENCE ON SYSTEMS, SIGNALS & DEVICES (SSD), 2016, : 1 - 5
[45] An Advanced NLP Framework for High-Quality Text-to-Speech Synthesis
Ungurean, Catalin
Burileanu, Dragos
2011 6TH CONFERENCE ON SPEECH TECHNOLOGY AND HUMAN-COMPUTER DIALOGUE (SPED), 2011,
[46] Enhancing the Quality of Nepali Text-to-Speech Systems
Ghimire, Rupak Raj
Bal, Bal Krishna
CREATIVITY IN INTELLIGENT TECHNOLOGIES AND DATA SCIENCE, (CIT&DS), 2017, 754 : 187 - 197
[47] Part of Speech Tagging for Romanian Text-to-Speech System
Teodorescu, Lucian Radu
Boldizsar, Razvan
Ordean, Mihai
Duma, Melania
Detesan, Laura
Ordean, Mihaela
13TH INTERNATIONAL SYMPOSIUM ON SYMBOLIC AND NUMERIC ALGORITHMS FOR SCIENTIFIC COMPUTING (SYNASC 2011), 2012, : 153 - 159
[48] Perceptual Quality Dimensions of Text-to-Speech Systems
Hinterleitner, Florian
Moeller, Sebastian
Norrenbrock, Christoph
Heute, Ulrich
12TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2011 (INTERSPEECH 2011), VOLS 1-5, 2011, : 2188 - 2191
[49] Enhanced quality text-to-speech for restricted domains
不详
BELL LABS TECHNICAL JOURNAL, 1997, 2 (04) : 169 - 170
[50] Prosody modeling for syllable based text-to-speech synthesis using feedforward neural networks
Reddy, V. Ramu
Rao, K. Sreenivasa
NEUROCOMPUTING, 2016, 171 : 1323 - 1334

← 1 2 3 4 5 →