A Mandarin text-to-speech system

被引：0

作者：

Hwang, SH

Chen, SH

Wang, YR

机构：

来源：

ICSLP 96 - FOURTH INTERNATIONAL CONFERENCE ON SPOKEN LANGUAGE PROCESSING, PROCEEDINGS, VOLS 1-4 | 1996年

关键词：

D O I：

暂无

中图分类号：

O42 [声学];

学科分类号：

070206 ; 082403 ;

摘要：

In this paper, the implementation of a high-performance Mandarin TTS system is presented. The system is composed of four main parts: text analysis (TA), prosodic information generation (PIG), waveform table (WT) of 411 base-syllables, and PSOLA-based waveform synthesis (PSOLA). In TA, a statistical model based method is first employed to automatically tag the input text to obtain the word sequence and the associated part-of-speech (POS) sequence. A lexicon containing about 80000 words is used in the tagging process. Then the corresponding base-syllable sequence is found and used to get from WT the basic wave-form sequence. Some linguistic features used in PIG are also extracted in TA, In PIG, a four-layer recurrent neural network (RNN) is employed to generate some prosodic information including pitch. contour, energy level, initial duration and final duration of syllable as well as inter-syllable pause duration. Finally, in PSOLA the basic waveform sequence is modified using the prosodic information to generate output synthetic speech, The whole system is implemented by software on a PC/AT 486 with a 16-bit Sound Blaster add-on card. Only 3.2 Mbyte memory space is required. It can synthesize speech in real-time for any input Chinese text. Informal listening tests by many native Chinese living in Taiwan confirmed that the synthetic speech sounded very fluent and natural.

引用

页码：1421 / 1424

页数：4

共 50 条

[41] Developing a Child Friendly Text-to-Speech System
Jacob, Agnes
Mythili, P.
[J]. ADVANCES IN HUMAN-COMPUTER INTERACTION, 2008, 2008
[42] A set of corpus-based text-to-speech synthesis technologies for Mandarin Chinese
Chou, FC
Tseng, CY
Lee, LS
[J]. IEEE TRANSACTIONS ON SPEECH AND AUDIO PROCESSING, 2002, 10 (07): : 481 - 494
[43] Integrating coding techniques into LP-based Mandarin text-to-speech synthesis
Hu H.-T.
Wang H.-M.
[J]. International Journal of Speech Technology, 2007, 10 (1) : 31 - 44
[44] A complete text-to-speech synthesis system in Tamil
Rama, GLJ
Ramakrishnan, AG
Muralishankar, R
Prathibha, R
[J]. PROCEEDINGS OF THE 2002 IEEE WORKSHOP ON SPEECH SYNTHESIS, 2002, : 191 - 194
[45] CONSIDERATIONS IN THE DESIGN OF A MULTILINGUAL TEXT-TO-SPEECH SYSTEM
BOVES, L
[J]. JOURNAL OF PHONETICS, 1991, 19 (01) : 25 - 36
[46] Mandarin Text-to-Speech Front-End With Lightweight Distilled Convolution Network
Zhao, Wei
Wang, Zuyi
Xu, Li
[J]. IEEE SIGNAL PROCESSING LETTERS, 2023, 30 : 249 - 253
[47] PROGRAM LIBRARY FOR DECTALK TEXT-TO-SPEECH SYSTEM
LOCK, S
LEONG, CK
[J]. BEHAVIOR RESEARCH METHODS INSTRUMENTS & COMPUTERS, 1989, 21 (03): : 394 - 400
[48] Towards a Modern Text-to-Speech System for Latvian
Dargis, Roberts
Auzina, Ilze
[J]. HUMAN LANGUAGE TECHNOLOGIES - THE BALTIC PERSPECTIVE, BALTIC HLT 2018, 2018, 307 : 26 - 29
[49] FOCUS AND ACCENT IN A DUTCH TEXT-TO-SPEECH SYSTEM
BAART, JLG
[J]. FOURTH CONFERENCE OF THE EUROPEAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, 1989, : 111 - 115
[50] Evaluation of The Concatenative Turkish Text-to-Speech System
Orhan, Zeynep
Gormez, Zeliha
[J]. PROCEEDINGS OF THE 2009 2ND INTERNATIONAL CONGRESS ON IMAGE AND SIGNAL PROCESSING, VOLS 1-9, 2009, : 4314 - +

← 1 2 3 4 5 →