A Mandarin text-to-speech system

被引:0
|
作者
Hwang, SH
Chen, SH
Wang, YR
机构
关键词
D O I
暂无
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
In this paper, the implementation of a high-performance Mandarin TTS system is presented. The system is composed of four main parts: text analysis (TA), prosodic information generation (PIG), waveform table (WT) of 411 base-syllables, and PSOLA-based waveform synthesis (PSOLA). In TA, a statistical model based method is first employed to automatically tag the input text to obtain the word sequence and the associated part-of-speech (POS) sequence. A lexicon containing about 80000 words is used in the tagging process. Then the corresponding base-syllable sequence is found and used to get from WT the basic wave-form sequence. Some linguistic features used in PIG are also extracted in TA, In PIG, a four-layer recurrent neural network (RNN) is employed to generate some prosodic information including pitch. contour, energy level, initial duration and final duration of syllable as well as inter-syllable pause duration. Finally, in PSOLA the basic waveform sequence is modified using the prosodic information to generate output synthetic speech, The whole system is implemented by software on a PC/AT 486 with a 16-bit Sound Blaster add-on card. Only 3.2 Mbyte memory space is required. It can synthesize speech in real-time for any input Chinese text. Informal listening tests by many native Chinese living in Taiwan confirmed that the synthetic speech sounded very fluent and natural.
引用
收藏
页码:1421 / 1424
页数:4
相关论文
共 50 条
  • [41] Developing a Child Friendly Text-to-Speech System
    Jacob, Agnes
    Mythili, P.
    [J]. ADVANCES IN HUMAN-COMPUTER INTERACTION, 2008, 2008
  • [42] A set of corpus-based text-to-speech synthesis technologies for Mandarin Chinese
    Chou, FC
    Tseng, CY
    Lee, LS
    [J]. IEEE TRANSACTIONS ON SPEECH AND AUDIO PROCESSING, 2002, 10 (07): : 481 - 494
  • [43] Integrating coding techniques into LP-based Mandarin text-to-speech synthesis
    Hu H.-T.
    Wang H.-M.
    [J]. International Journal of Speech Technology, 2007, 10 (1) : 31 - 44
  • [44] A complete text-to-speech synthesis system in Tamil
    Rama, GLJ
    Ramakrishnan, AG
    Muralishankar, R
    Prathibha, R
    [J]. PROCEEDINGS OF THE 2002 IEEE WORKSHOP ON SPEECH SYNTHESIS, 2002, : 191 - 194
  • [45] CONSIDERATIONS IN THE DESIGN OF A MULTILINGUAL TEXT-TO-SPEECH SYSTEM
    BOVES, L
    [J]. JOURNAL OF PHONETICS, 1991, 19 (01) : 25 - 36
  • [46] Mandarin Text-to-Speech Front-End With Lightweight Distilled Convolution Network
    Zhao, Wei
    Wang, Zuyi
    Xu, Li
    [J]. IEEE SIGNAL PROCESSING LETTERS, 2023, 30 : 249 - 253
  • [47] PROGRAM LIBRARY FOR DECTALK TEXT-TO-SPEECH SYSTEM
    LOCK, S
    LEONG, CK
    [J]. BEHAVIOR RESEARCH METHODS INSTRUMENTS & COMPUTERS, 1989, 21 (03): : 394 - 400
  • [48] Towards a Modern Text-to-Speech System for Latvian
    Dargis, Roberts
    Auzina, Ilze
    [J]. HUMAN LANGUAGE TECHNOLOGIES - THE BALTIC PERSPECTIVE, BALTIC HLT 2018, 2018, 307 : 26 - 29
  • [49] FOCUS AND ACCENT IN A DUTCH TEXT-TO-SPEECH SYSTEM
    BAART, JLG
    [J]. FOURTH CONFERENCE OF THE EUROPEAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, 1989, : 111 - 115
  • [50] Evaluation of The Concatenative Turkish Text-to-Speech System
    Orhan, Zeynep
    Gormez, Zeliha
    [J]. PROCEEDINGS OF THE 2009 2ND INTERNATIONAL CONGRESS ON IMAGE AND SIGNAL PROCESSING, VOLS 1-9, 2009, : 4314 - +