A Mandarin text-to-speech system

被引：0

作者：

Hwang, SH

Chen, SH

Wang, YR

机构：

来源：

ICSLP 96 - FOURTH INTERNATIONAL CONFERENCE ON SPOKEN LANGUAGE PROCESSING, PROCEEDINGS, VOLS 1-4 | 1996年

关键词：

D O I：

暂无

中图分类号：

O42 [声学];

学科分类号：

070206 ; 082403 ;

摘要：

In this paper, the implementation of a high-performance Mandarin TTS system is presented. The system is composed of four main parts: text analysis (TA), prosodic information generation (PIG), waveform table (WT) of 411 base-syllables, and PSOLA-based waveform synthesis (PSOLA). In TA, a statistical model based method is first employed to automatically tag the input text to obtain the word sequence and the associated part-of-speech (POS) sequence. A lexicon containing about 80000 words is used in the tagging process. Then the corresponding base-syllable sequence is found and used to get from WT the basic wave-form sequence. Some linguistic features used in PIG are also extracted in TA, In PIG, a four-layer recurrent neural network (RNN) is employed to generate some prosodic information including pitch. contour, energy level, initial duration and final duration of syllable as well as inter-syllable pause duration. Finally, in PSOLA the basic waveform sequence is modified using the prosodic information to generate output synthetic speech, The whole system is implemented by software on a PC/AT 486 with a 16-bit Sound Blaster add-on card. Only 3.2 Mbyte memory space is required. It can synthesize speech in real-time for any input Chinese text. Informal listening tests by many native Chinese living in Taiwan confirmed that the synthetic speech sounded very fluent and natural.

引用

页码：1421 / 1424

页数：4

共 50 条

[31] TEXT-TO-SPEECH TRANSLATION SYSTEM FOR ITALIAN
LESMO, L
MEZZALAMA, M
TORASSO, P
[J]. INTERNATIONAL JOURNAL OF MAN-MACHINE STUDIES, 1978, 10 (05): : 569 - 591
[32] Dealing with prosody in a text-to-speech system
Goldsmith J.
[J]. International Journal of Speech Technology, 1999, 3 (1) : 51 - 63
[33] EXPERIMENTAL TEXT-TO-SPEECH SYSTEM FOR HANDICAPPED
CARLSON, R
GRANSTROM, B
[J]. JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA, 1978, 64 : S163 - S163
[34] Issues in Chinese prosody: conceptual foundations of a linguistically-motivated text-to-speech system for Mandarin
Lavin, Richard S.
[J]. PACLIC 16: LANGUAGE, INFORMATION, AND COMPUTATION, PROCEEDINGS, 2002, : 259 - 270
[35] An Improved Method for Predicting Fundamental Frequency Contour in Mandarin Text-to-Speech System with a Small Corpus
Wang, Liang
Zhu, Jie
Lv, Yao
[J]. TENCON 2010: 2010 IEEE REGION 10 CONFERENCE, 2010, : 751 - 754
[36] An efficient text analyzer with prosody generator-driven approach for mandarin text-to-speech
Hwang, SH
Yeh, CY
[J]. 2003 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOL I, PROCEEDINGS: SPEECH PROCESSING I, 2003, : 488 - 491
[37] Efficient text analyser with prosody generator-driven approach for Mandarin text-to-speech
Yeh, CY
Hwang, SH
[J]. IEE PROCEEDINGS-VISION IMAGE AND SIGNAL PROCESSING, 2005, 152 (06): : 793 - 799
[38] An RNN-based prosodic information synthesizer for Mandarin text-to-speech
Chen, SH
Hwang, SH
Wang, YR
[J]. IEEE TRANSACTIONS ON SPEECH AND AUDIO PROCESSING, 1998, 6 (03): : 226 - 239
[39] Part of Speech Tagging for Romanian Text-to-Speech System
Teodorescu, Lucian Radu
Boldizsar, Razvan
Ordean, Mihai
Duma, Melania
Detesan, Laura
Ordean, Mihaela
[J]. 13TH INTERNATIONAL SYMPOSIUM ON SYMBOLIC AND NUMERIC ALGORITHMS FOR SCIENTIFIC COMPUTING (SYNASC 2011), 2012, : 153 - 159
[40] A mandarin text-to-speech technique implemented on a PIC-based microcontroller platform
Yeh, Cheng-Yu
Chang, Chih-Hsuan
[J]. IEEJ TRANSACTIONS ON ELECTRICAL AND ELECTRONIC ENGINEERING, 2016, 11 : S60 - S64

← 1 2 3 4 5 →