A high quality text-to-speech system composed of multiple neural networks

被引：0

作者：

Karaali, O ^{[1
]}

Corrigan, G ^{[1
]}

Massey, N ^{[1
]}

Miller, C ^{[1
]}

Schnurr, O ^{[1
]}

Mackie, A ^{[1
]}

机构：

[1] Motorola Inc, Speech Proc Lab, Schaumburg, IL 60196 USA

来源：

PROCEEDINGS OF THE 1998 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING, VOLS 1-6 | 1998年

关键词：

D O I：

暂无

中图分类号：

O42 [声学];

学科分类号：

070206 ; 082403 ;

摘要：

While neural networks have been employed to handle several different text-to-speech tasks, ours is the first system to use neural networks throughout, for both linguistic and acoustic processing. We divide the text-to-speech task into three subtasks, a linguistic module mapping from text to a linguistic representation, an acoustic module mapping from the linguistic representation to speech, and a video module mapping from the linguistic representation to animated images. The linguistic module employs a letter-to-sound neural network and a postlexical neural network. The acoustic module employs a duration neural network and a phonetic neural network. The visual neural network is employed in parallel to the acoustic module to drive a talking head. The use of neural networks that can be retrained on the characteristics of different voices and languages affords our system a degree of adaptability and naturalness heretofore unavailable.

引用

页码：1237 / 1240

页数：4

共 50 条

[1] Text-To-Speech quality evaluation based on LSTM Recurrent Neural Networks
Tang, Meng
Zhu, Jie
2019 INTERNATIONAL CONFERENCE ON COMPUTING, NETWORKING AND COMMUNICATIONS (ICNC), 2019, : 260 - 264
[2] Neural networks for text-to-speech phoneme recognition
Embrechts, MJ
Arciniegas, F
SMC 2000 CONFERENCE PROCEEDINGS: 2000 IEEE INTERNATIONAL CONFERENCE ON SYSTEMS, MAN & CYBERNETICS, VOL 1-5, 2000, : 3582 - 3587
[3] Application of neural networks to duration modeling in a Spanish text-to-speech system
Córdoba, R.
Montero, J.M.
Pardo, J.M.
Advances in Systems Engineering, Signal Processing and Communications, 2002, : 244 - 247
[4] CATOTRON - A Neural Text-to-Speech System in Catalan
Kulebi, Baybars
Oktem, Alp
Peiro-Lilja, Alex
Pascual, Santiago
Farrus, Mireia
INTERSPEECH 2020, 2020, : 490 - 491
[5] Neural networks in text-to-speech systems for the Greek language
Falas, T
Stafylopatis, AG
MELECON 2000: INFORMATION TECHNOLOGY AND ELECTROTECHNOLOGY FOR THE MEDITERRANEAN COUNTRIES, VOLS 1-3, PROCEEDINGS, 2000, : 574 - 577
[6] High-quality prosody generation in Mandarin text-to-speech system
Guo, Qing
Zhang, Jie
Katae, Nobuyuki
Yu, Hao
Fujitsu Scientific and Technical Journal, 2010, 46 (01): : 40 - 46
[7] High-Quality Prosody Generation in Mandarin Text-to-Speech System
Guo, Qing
Zhang, Jie
Katae, Nobuyuki
Yu, Hao
FUJITSU SCIENTIFIC & TECHNICAL JOURNAL, 2010, 46 (01): : 40 - 46
[8] High-quality text-to-speech synthesis: An overview
Dutoit, T.
Journal of Electrical and Electronics Engineering, Australia, 1997, 17 (01): : 25 - 36
[9] On a cepstral technique for pitch control in the high quality text-to-speech type system
Bae, MJ
Lee, SH
PROCEEDINGS OF THE 39TH MIDWEST SYMPOSIUM ON CIRCUITS AND SYSTEMS, VOLS I-III, 1996, : 803 - 806
[10] SpikeVoice: High-Quality Text-to-Speech Via Efficient Spiking Neural Network
Wang, Kexin
Zhang, Jiahong
Ren, Yong
Yao, Man
Di Shang
Xu, Bo
Li, Guoqi
PROCEEDINGS OF THE 62ND ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, VOL 1: LONG PAPERS, 2024, : 7927 - 7940

← 1 2 3 4 5 →