A Mandarin text-to-speech system

被引:0
|
作者
Hwang, SH
Chen, SH
Wang, YR
机构
关键词
D O I
暂无
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
In this paper, the implementation of a high-performance Mandarin TTS system is presented. The system is composed of four main parts: text analysis (TA), prosodic information generation (PIG), waveform table (WT) of 411 base-syllables, and PSOLA-based waveform synthesis (PSOLA). In TA, a statistical model based method is first employed to automatically tag the input text to obtain the word sequence and the associated part-of-speech (POS) sequence. A lexicon containing about 80000 words is used in the tagging process. Then the corresponding base-syllable sequence is found and used to get from WT the basic wave-form sequence. Some linguistic features used in PIG are also extracted in TA, In PIG, a four-layer recurrent neural network (RNN) is employed to generate some prosodic information including pitch. contour, energy level, initial duration and final duration of syllable as well as inter-syllable pause duration. Finally, in PSOLA the basic waveform sequence is modified using the prosodic information to generate output synthetic speech, The whole system is implemented by software on a PC/AT 486 with a 16-bit Sound Blaster add-on card. Only 3.2 Mbyte memory space is required. It can synthesize speech in real-time for any input Chinese text. Informal listening tests by many native Chinese living in Taiwan confirmed that the synthetic speech sounded very fluent and natural.
引用
收藏
页码:1421 / 1424
页数:4
相关论文
共 50 条
  • [31] TEXT-TO-SPEECH TRANSLATION SYSTEM FOR ITALIAN
    LESMO, L
    MEZZALAMA, M
    TORASSO, P
    [J]. INTERNATIONAL JOURNAL OF MAN-MACHINE STUDIES, 1978, 10 (05): : 569 - 591
  • [32] Dealing with prosody in a text-to-speech system
    Goldsmith J.
    [J]. International Journal of Speech Technology, 1999, 3 (1) : 51 - 63
  • [33] EXPERIMENTAL TEXT-TO-SPEECH SYSTEM FOR HANDICAPPED
    CARLSON, R
    GRANSTROM, B
    [J]. JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA, 1978, 64 : S163 - S163
  • [34] Issues in Chinese prosody: conceptual foundations of a linguistically-motivated text-to-speech system for Mandarin
    Lavin, Richard S.
    [J]. PACLIC 16: LANGUAGE, INFORMATION, AND COMPUTATION, PROCEEDINGS, 2002, : 259 - 270
  • [35] An Improved Method for Predicting Fundamental Frequency Contour in Mandarin Text-to-Speech System with a Small Corpus
    Wang, Liang
    Zhu, Jie
    Lv, Yao
    [J]. TENCON 2010: 2010 IEEE REGION 10 CONFERENCE, 2010, : 751 - 754
  • [36] An efficient text analyzer with prosody generator-driven approach for mandarin text-to-speech
    Hwang, SH
    Yeh, CY
    [J]. 2003 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOL I, PROCEEDINGS: SPEECH PROCESSING I, 2003, : 488 - 491
  • [37] Efficient text analyser with prosody generator-driven approach for Mandarin text-to-speech
    Yeh, CY
    Hwang, SH
    [J]. IEE PROCEEDINGS-VISION IMAGE AND SIGNAL PROCESSING, 2005, 152 (06): : 793 - 799
  • [38] An RNN-based prosodic information synthesizer for Mandarin text-to-speech
    Chen, SH
    Hwang, SH
    Wang, YR
    [J]. IEEE TRANSACTIONS ON SPEECH AND AUDIO PROCESSING, 1998, 6 (03): : 226 - 239
  • [39] Part of Speech Tagging for Romanian Text-to-Speech System
    Teodorescu, Lucian Radu
    Boldizsar, Razvan
    Ordean, Mihai
    Duma, Melania
    Detesan, Laura
    Ordean, Mihaela
    [J]. 13TH INTERNATIONAL SYMPOSIUM ON SYMBOLIC AND NUMERIC ALGORITHMS FOR SCIENTIFIC COMPUTING (SYNASC 2011), 2012, : 153 - 159
  • [40] A mandarin text-to-speech technique implemented on a PIC-based microcontroller platform
    Yeh, Cheng-Yu
    Chang, Chih-Hsuan
    [J]. IEEJ TRANSACTIONS ON ELECTRICAL AND ELECTRONIC ENGINEERING, 2016, 11 : S60 - S64