A Mandarin text-to-speech system

被引:0
|
作者
Hwang, SH
Chen, SH
Wang, YR
机构
关键词
D O I
暂无
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
In this paper, the implementation of a high-performance Mandarin TTS system is presented. The system is composed of four main parts: text analysis (TA), prosodic information generation (PIG), waveform table (WT) of 411 base-syllables, and PSOLA-based waveform synthesis (PSOLA). In TA, a statistical model based method is first employed to automatically tag the input text to obtain the word sequence and the associated part-of-speech (POS) sequence. A lexicon containing about 80000 words is used in the tagging process. Then the corresponding base-syllable sequence is found and used to get from WT the basic wave-form sequence. Some linguistic features used in PIG are also extracted in TA, In PIG, a four-layer recurrent neural network (RNN) is employed to generate some prosodic information including pitch. contour, energy level, initial duration and final duration of syllable as well as inter-syllable pause duration. Finally, in PSOLA the basic waveform sequence is modified using the prosodic information to generate output synthetic speech, The whole system is implemented by software on a PC/AT 486 with a 16-bit Sound Blaster add-on card. Only 3.2 Mbyte memory space is required. It can synthesize speech in real-time for any input Chinese text. Informal listening tests by many native Chinese living in Taiwan confirmed that the synthetic speech sounded very fluent and natural.
引用
收藏
页码:1421 / 1424
页数:4
相关论文
共 50 条
  • [1] Text normalization in mandarin Text-to-Speech system
    Jia, Yuxiang
    Huang, Dezhi
    Liu, Wu
    Dong, Yuan
    Yu, Shiwen
    Wang, Haila
    [J]. 2008 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING, VOLS 1-12, 2008, : 4693 - +
  • [2] The pause duration prediction for mandarin text-to-speech system
    Yu, J
    Tao, JH
    [J]. Proceedings of the 2005 IEEE International Conference on Natural Language Processing and Knowledge Engineering (IEEE NLP-KE'05), 2005, : 204 - 208
  • [3] A Prosodic Mandarin Text-to-Speech System Based on Tacotron
    Zhang, Chuxiong
    Zhang, Sheng
    Zhong, Haibing
    [J]. 2019 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA ASC), 2019, : 165 - 169
  • [4] An efficient Mandarin text-to-speech system on time domain
    Lin, YJ
    Yu, MS
    [J]. IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, 1998, E81D (06): : 545 - 555
  • [5] Pitch models of Mandarin text-to-speech
    邵艳秋
    穗志方
    韩纪庆
    [J]. Journal of Harbin Institute of Technology(New series), 2009, 16 (02) : 179 - 184
  • [6] An HMM-based Mandarin Chinese Text-to-Speech system
    Qian, Yao
    Soong, Frank
    Chen, Yining
    Chu, Min
    [J]. CHINESE SPOKEN LANGUAGE PROCESSING, PROCEEDINGS, 2006, 4274 : 223 - +
  • [7] Hierarchical Stress Modeling in Mandarin Text-to-Speech
    Li, Ya
    Tao, Jianhua
    Xu, Xiaoying
    [J]. 12TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2011 (INTERSPEECH 2011), VOLS 1-5, 2011, : 2024 - +
  • [8] High-Quality Prosody Generation in Mandarin Text-to-Speech System
    Guo, Qing
    Zhang, Jie
    Katae, Nobuyuki
    Yu, Hao
    [J]. FUJITSU SCIENTIFIC & TECHNICAL JOURNAL, 2010, 46 (01): : 40 - 46
  • [9] Prosody model in a Mandarin Text-to-Speech System based on a hierarchical approach
    Pan, NH
    Jen, WT
    Yu, SS
    Yu, MS
    Huang, SY
    Wu, MJ
    [J]. 2000 IEEE INTERNATIONAL CONFERENCE ON MULTIMEDIA AND EXPO, PROCEEDINGS VOLS I-III, 2000, : 448 - 451
  • [10] A consistency analysis on an acoustic module for Mandarin text-to-speech
    Yeh, Cheng-Yu
    Chang, Shun-Chieh
    Hwang, Shaw-Hwa
    [J]. SPEECH COMMUNICATION, 2013, 55 (02) : 266 - 277