A high quality text-to-speech system composed of multiple neural networks

被引:0
|
作者
Karaali, O [1 ]
Corrigan, G [1 ]
Massey, N [1 ]
Miller, C [1 ]
Schnurr, O [1 ]
Mackie, A [1 ]
机构
[1] Motorola Inc, Speech Proc Lab, Schaumburg, IL 60196 USA
关键词
D O I
暂无
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
While neural networks have been employed to handle several different text-to-speech tasks, ours is the first system to use neural networks throughout, for both linguistic and acoustic processing. We divide the text-to-speech task into three subtasks, a linguistic module mapping from text to a linguistic representation, an acoustic module mapping from the linguistic representation to speech, and a video module mapping from the linguistic representation to animated images. The linguistic module employs a letter-to-sound neural network and a postlexical neural network. The acoustic module employs a duration neural network and a phonetic neural network. The visual neural network is employed in parallel to the acoustic module to drive a talking head. The use of neural networks that can be retrained on the characteristics of different voices and languages affords our system a degree of adaptability and naturalness heretofore unavailable.
引用
收藏
页码:1237 / 1240
页数:4
相关论文
共 50 条
  • [21] EfficientTTS: An Efficient and High-Quality Text-to-Speech Architecture
    Miao, Chenfeng
    Liang, Shuang
    Liu, Zhencheng
    Chen, Minchuan
    Ma, Jun
    Wang, Shaojun
    Xiao, Jing
    INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 139, 2021, 139
  • [22] PortaSpeech: Portable and High-Quality Generative Text-to-Speech
    Ren, Yi
    Liu, Jinglin
    Zhao, Zhou
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 34 (NEURIPS 2021), 2021, 34
  • [23] Text analysis for the Slovenian text-to-speech system
    Sef, T
    ICECS 2001: 8TH IEEE INTERNATIONAL CONFERENCE ON ELECTRONICS, CIRCUITS AND SYSTEMS, VOLS I-III, CONFERENCE PROCEEDINGS, 2001, : 1355 - 1358
  • [24] On a modified cepstral pitch control technique for the high quality text-to-speech type system
    Kim, J
    Bae, M
    1998 MIDWEST SYMPOSIUM ON CIRCUITS AND SYSTEMS, PROCEEDINGS, 1999, : 616 - 619
  • [25] Implementation of high quality text-to-speech using words and diphones
    Shukla, SR
    Barnwell, TP
    2001 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOLS I-VI, PROCEEDINGS: VOL I: SPEECH PROCESSING 1; VOL II: SPEECH PROCESSING 2 IND TECHNOL TRACK DESIGN & IMPLEMENTATION OF SIGNAL PROCESSING SYSTEMS NEURALNETWORKS FOR SIGNAL PROCESSING; VOL III: IMAGE & MULTIDIMENSIONAL SIGNAL PROCESSING MULTIMEDIA SIGNAL PROCESSING - VOL IV: SIGNAL PROCESSING FOR COMMUNICATIONS; VOL V: SIGNAL PROCESSING EDUCATION SENSOR ARRAY & MULTICHANNEL SIGNAL PROCESSING AUDIO & ELECTROACOUSTICS; VOL VI: SIGNAL PROCESSING THEORY & METHODS STUDENT FORUM, 2001, : 4020 - 4020
  • [26] On a spectral scaling technique for pitch control in the high quality text-to-speech type system
    Chung, HG
    Bae, MJ
    40TH MIDWEST SYMPOSIUM ON CIRCUITS AND SYSTEMS, VOLS 1 AND 2, 1998, : 1430 - 1433
  • [27] Syntactic analysis and letter-to-phoneme conversion using neural networks - an application of neural networks to an English text-to-speech system
    Yamaguchi, Yukiko
    Matsumoto, Tatsuro
    Systems and Computers in Japan, 1993, 24 (08) : 71 - 81
  • [28] Text normalization in mandarin Text-to-Speech system
    Jia, Yuxiang
    Huang, Dezhi
    Liu, Wu
    Dong, Yuan
    Yu, Shiwen
    Wang, Haila
    2008 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING, VOLS 1-12, 2008, : 4693 - +
  • [29] Development of Assamese Text-to-speech System using Deep Neural Network
    Deka, Abhash
    Sarmah, Priyankoo
    Samudravijaya, K.
    Prasanna, S. R. M.
    2019 25TH NATIONAL CONFERENCE ON COMMUNICATIONS (NCC), 2019,
  • [30] JAPANESE TEXT-TO-SPEECH CONVERSION SYSTEM
    SATO, H
    REVIEW OF THE ELECTRICAL COMMUNICATIONS LABORATORIES, 1984, 32 (02): : 179 - 187