A high quality text-to-speech system composed of multiple neural networks

被引:0
|
作者
Karaali, O [1 ]
Corrigan, G [1 ]
Massey, N [1 ]
Miller, C [1 ]
Schnurr, O [1 ]
Mackie, A [1 ]
机构
[1] Motorola Inc, Speech Proc Lab, Schaumburg, IL 60196 USA
关键词
D O I
暂无
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
While neural networks have been employed to handle several different text-to-speech tasks, ours is the first system to use neural networks throughout, for both linguistic and acoustic processing. We divide the text-to-speech task into three subtasks, a linguistic module mapping from text to a linguistic representation, an acoustic module mapping from the linguistic representation to speech, and a video module mapping from the linguistic representation to animated images. The linguistic module employs a letter-to-sound neural network and a postlexical neural network. The acoustic module employs a duration neural network and a phonetic neural network. The visual neural network is employed in parallel to the acoustic module to drive a talking head. The use of neural networks that can be retrained on the characteristics of different voices and languages affords our system a degree of adaptability and naturalness heretofore unavailable.
引用
收藏
页码:1237 / 1240
页数:4
相关论文
共 50 条
  • [1] Text-To-Speech quality evaluation based on LSTM Recurrent Neural Networks
    Tang, Meng
    Zhu, Jie
    2019 INTERNATIONAL CONFERENCE ON COMPUTING, NETWORKING AND COMMUNICATIONS (ICNC), 2019, : 260 - 264
  • [2] Neural networks for text-to-speech phoneme recognition
    Embrechts, MJ
    Arciniegas, F
    SMC 2000 CONFERENCE PROCEEDINGS: 2000 IEEE INTERNATIONAL CONFERENCE ON SYSTEMS, MAN & CYBERNETICS, VOL 1-5, 2000, : 3582 - 3587
  • [3] Application of neural networks to duration modeling in a Spanish text-to-speech system
    Córdoba, R.
    Montero, J.M.
    Pardo, J.M.
    Advances in Systems Engineering, Signal Processing and Communications, 2002, : 244 - 247
  • [4] CATOTRON - A Neural Text-to-Speech System in Catalan
    Kulebi, Baybars
    Oktem, Alp
    Peiro-Lilja, Alex
    Pascual, Santiago
    Farrus, Mireia
    INTERSPEECH 2020, 2020, : 490 - 491
  • [5] Neural networks in text-to-speech systems for the Greek language
    Falas, T
    Stafylopatis, AG
    MELECON 2000: INFORMATION TECHNOLOGY AND ELECTROTECHNOLOGY FOR THE MEDITERRANEAN COUNTRIES, VOLS 1-3, PROCEEDINGS, 2000, : 574 - 577
  • [6] High-quality prosody generation in Mandarin text-to-speech system
    Guo, Qing
    Zhang, Jie
    Katae, Nobuyuki
    Yu, Hao
    Fujitsu Scientific and Technical Journal, 2010, 46 (01): : 40 - 46
  • [7] High-Quality Prosody Generation in Mandarin Text-to-Speech System
    Guo, Qing
    Zhang, Jie
    Katae, Nobuyuki
    Yu, Hao
    FUJITSU SCIENTIFIC & TECHNICAL JOURNAL, 2010, 46 (01): : 40 - 46
  • [8] High-quality text-to-speech synthesis: An overview
    Dutoit, T.
    Journal of Electrical and Electronics Engineering, Australia, 1997, 17 (01): : 25 - 36
  • [9] On a cepstral technique for pitch control in the high quality text-to-speech type system
    Bae, MJ
    Lee, SH
    PROCEEDINGS OF THE 39TH MIDWEST SYMPOSIUM ON CIRCUITS AND SYSTEMS, VOLS I-III, 1996, : 803 - 806
  • [10] SpikeVoice: High-Quality Text-to-Speech Via Efficient Spiking Neural Network
    Wang, Kexin
    Zhang, Jiahong
    Ren, Yong
    Yao, Man
    Di Shang
    Xu, Bo
    Li, Guoqi
    PROCEEDINGS OF THE 62ND ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, VOL 1: LONG PAPERS, 2024, : 7927 - 7940