Automatic generation of synthesis units and prosodic information for Chinese concatenative synthesis

被引：39

作者：

Wu, CH ^{[1
]}

Chen, JH ^{[1
]}

机构：

[1] Natl Cheng Kung Univ, Dept Comp Sci & Informat Engn, Tainan, Taiwan

来源：

SPEECH COMMUNICATION | 2001年 / 35卷 / 3-4期

关键词：

Chinese text-to-speech conversion; synthesis units; prosodic information; concatenative synthesis; pitch contour; syllable duration;

D O I：

10.1016/S0167-6393(00)00075-3

中图分类号：

O42 [声学];

学科分类号：

070206 ; 082403 ;

摘要：

In this paper, some approaches to the generation of synthesis units and prosodic information are proposed for Mandarin Chinese text-to-speech (TTS) conversion. The monosyllables are adopted as the basic synthesis units. A. set of synthesis units is selected from a large continuous speech database based on two cost functions, which minimize the inter- and intra-syllable distortion. The speech database is also employed to establish a word-prosody-based template tree according to the linguistic features: tone combination, word length, part-of-speech (POS) of the word, and word position in a phrase. This template tree stores them prosodic features including pitch contour, average energy, and syllable duration of a word for possible combinations of linguistic features. Two modules for sentence intonation and template selection are proposed to generate the target prosodic templates. The experimental results showed that the synthesized prosodic features matched quite well with their original counterparts. Evaluation by subjective experiments also confirmed the satisfactory performance of these approaches. (C) 2001 Elsevier Science B.V. All rights reserved.

引用

页码：219 / 237

页数：19

共 50 条

[41] Subspace and hypothesis based effective segmentation of co-articulated basic-units for concatenative speech synthesis
Muralishankar, R
Srikanth, R
Ramakrishnan, AG
[J]. IEEE TENCON 2003: CONFERENCE ON CONVERGENT TECHNOLOGIES FOR THE ASIA-PACIFIC REGION, VOLS 1-4, 2003, : 388 - 392
[42] Context-adaptive smoothing for concatenative speech synthesis
Lee, KS
Kim, SR
[J]. IEEE SIGNAL PROCESSING LETTERS, 2002, 9 (12) : 422 - 425
[43] Synthesis and generation of fractal information signals
Bolotov, VN
Tkach, YV
Tkach, YY
[J]. 12TH INTERNATIONAL CONFERENCE - MICROWAVE & TELECOMMUNICATION TECHNOLOGY, CONFERENCE PROCEEDINGS, 2002, : 247 - 248
[44] Head movement synthesis based on semantic and prosodic features for a chinese expressive avatar
Zhang, Shen
Wu, Zhiyong
Meng, Helen M.
Cai, Lianhong
[J]. 2007 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOL IV, PTS 1-3, 2007, : 837 - +
[45] SELF-ATTENTION BASED PROSODIC BOUNDARY PREDICTION FOR CHINESE SPEECH SYNTHESIS
Lu, Chunhui
Zhang, Pengyuan
Yan, Yonghong
[J]. 2019 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2019, : 7035 - 7039
[46] The phase substitutions in Czech harmonic concatenative speech synthesis
Tychtl, Z
Matous, K
[J]. TEXT, SPEECH AND DIALOGUE, PROCEEDINGS, 2003, 2807 : 333 - 340
[47] SYNTHESIS BY RULE OF PROSODIC FEATURES
MATTINGL.IG
[J]. LANGUAGE AND SPEECH, 1966, 9 : 1 - &
[48] Syllable Based Concatenative Synthesis for Text to Speech Conversion
Ananthi, S.
Dhanalakshmi, P.
[J]. COMPUTATIONAL INTELLIGENCE IN DATA MINING, VOL 3, 2015, 33
[49] SYNTHESIS BY RULE OF PROSODIC FEATURES IN WORD CONCATENATION SYNTHESIS
YOUNG, SJ
FALLSIDE, F
[J]. INTERNATIONAL JOURNAL OF MAN-MACHINE STUDIES, 1980, 12 (03): : 241 - 258
[50] Spectral dynamics as a source of discontinuity in concatenative speech synthesis
Kirkpatrick, Barry
O'Brien, Darragh
Scaife, Ronan
Errity, Andrew
[J]. PROCEEDINGS OF THE 2007 15TH INTERNATIONAL CONFERENCE ON DIGITAL SIGNAL PROCESSING, 2007, : 615 - +

← 1 2 3 4 5 →