A high quality text-to-speech system composed of multiple neural networks

被引：0

作者：

Karaali, O ^{[1
]}

Corrigan, G ^{[1
]}

Massey, N ^{[1
]}

Miller, C ^{[1
]}

Schnurr, O ^{[1
]}

Mackie, A ^{[1
]}

机构：

[1] Motorola Inc, Speech Proc Lab, Schaumburg, IL 60196 USA

来源：

PROCEEDINGS OF THE 1998 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING, VOLS 1-6 | 1998年

关键词：

D O I：

暂无

中图分类号：

O42 [声学];

学科分类号：

070206 ; 082403 ;

摘要：

While neural networks have been employed to handle several different text-to-speech tasks, ours is the first system to use neural networks throughout, for both linguistic and acoustic processing. We divide the text-to-speech task into three subtasks, a linguistic module mapping from text to a linguistic representation, an acoustic module mapping from the linguistic representation to speech, and a video module mapping from the linguistic representation to animated images. The linguistic module employs a letter-to-sound neural network and a postlexical neural network. The acoustic module employs a duration neural network and a phonetic neural network. The visual neural network is employed in parallel to the acoustic module to drive a talking head. The use of neural networks that can be retrained on the characteristics of different voices and languages affords our system a degree of adaptability and naturalness heretofore unavailable.

引用

页码：1237 / 1240

页数：4

共 50 条

[21] EfficientTTS: An Efficient and High-Quality Text-to-Speech Architecture
Miao, Chenfeng
Liang, Shuang
Liu, Zhencheng
Chen, Minchuan
Ma, Jun
Wang, Shaojun
Xiao, Jing
INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 139, 2021, 139
[22] PortaSpeech: Portable and High-Quality Generative Text-to-Speech
Ren, Yi
Liu, Jinglin
Zhao, Zhou
ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 34 (NEURIPS 2021), 2021, 34
[23] Text analysis for the Slovenian text-to-speech system
Sef, T
ICECS 2001: 8TH IEEE INTERNATIONAL CONFERENCE ON ELECTRONICS, CIRCUITS AND SYSTEMS, VOLS I-III, CONFERENCE PROCEEDINGS, 2001, : 1355 - 1358
[24] On a modified cepstral pitch control technique for the high quality text-to-speech type system
Kim, J
Bae, M
1998 MIDWEST SYMPOSIUM ON CIRCUITS AND SYSTEMS, PROCEEDINGS, 1999, : 616 - 619
[25] Implementation of high quality text-to-speech using words and diphones
Shukla, SR
Barnwell, TP
2001 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOLS I-VI, PROCEEDINGS: VOL I: SPEECH PROCESSING 1; VOL II: SPEECH PROCESSING 2 IND TECHNOL TRACK DESIGN & IMPLEMENTATION OF SIGNAL PROCESSING SYSTEMS NEURALNETWORKS FOR SIGNAL PROCESSING; VOL III: IMAGE & MULTIDIMENSIONAL SIGNAL PROCESSING MULTIMEDIA SIGNAL PROCESSING - VOL IV: SIGNAL PROCESSING FOR COMMUNICATIONS; VOL V: SIGNAL PROCESSING EDUCATION SENSOR ARRAY & MULTICHANNEL SIGNAL PROCESSING AUDIO & ELECTROACOUSTICS; VOL VI: SIGNAL PROCESSING THEORY & METHODS STUDENT FORUM, 2001, : 4020 - 4020
[26] On a spectral scaling technique for pitch control in the high quality text-to-speech type system
Chung, HG
Bae, MJ
40TH MIDWEST SYMPOSIUM ON CIRCUITS AND SYSTEMS, VOLS 1 AND 2, 1998, : 1430 - 1433
[27] Syntactic analysis and letter-to-phoneme conversion using neural networks - an application of neural networks to an English text-to-speech system
Yamaguchi, Yukiko
Matsumoto, Tatsuro
Systems and Computers in Japan, 1993, 24 (08) : 71 - 81
[28] Text normalization in mandarin Text-to-Speech system
Jia, Yuxiang
Huang, Dezhi
Liu, Wu
Dong, Yuan
Yu, Shiwen
Wang, Haila
2008 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING, VOLS 1-12, 2008, : 4693 - +
[29] Development of Assamese Text-to-speech System using Deep Neural Network
Deka, Abhash
Sarmah, Priyankoo
Samudravijaya, K.
Prasanna, S. R. M.
2019 25TH NATIONAL CONFERENCE ON COMMUNICATIONS (NCC), 2019,
[30] JAPANESE TEXT-TO-SPEECH CONVERSION SYSTEM
SATO, H
REVIEW OF THE ELECTRICAL COMMUNICATIONS LABORATORIES, 1984, 32 (02): : 179 - 187

← 1 2 3 4 5 →