A Variable Break Prediction Method Using CART in a Japanese Text-to-Speech System

被引：1

作者：

Na, Deok-Su ^{[1
]}

Bae, Myung-Jin ^{[2
]}

机构：

[1] Voiceware Co Ltd, Seoul 133120, South Korea

[2] Soongsil Univ, Dept Informat & Telecommun Engn, Seoul, South Korea

来源：

IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS | 2009年 / E92D卷 / 02期

关键词：

text-to-speech system; break prediction; variable break;

D O I：

10.1587/transinf.E92.D.349

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

Break prediction is an important step in text-to-speech systems as break indices (BIs) have a great influence on how to correctly represent prosodic phrase boundaries. However, an accurate prediction is difficult since BIs are often chosen according to the meaning of a sentence or the reading style of the speaker. In Japanese, the prediction of an accentual phrase boundary (APB) and major phrase boundary (MPB) is particularly difficult. Thus, this paper presents a method to complement the prediction errors of an APB and MPB. First, we define a subtle BI in which it is difficult to decide between an APB and MPB clearly as a variable break (VB), and an explicit BI as a fixed break (FB). The VB is chosen using the classification and regression tree, and multiple prosodic targets in relation to the pith and duration are then generated. Finally, unit-selection is conducted using multiple prosodic targets. The experimental results show that the proposed method improves the naturalness of synthesized speech.

引用

页码：349 / 352

页数：4

共 50 条

[1] A Performance Improvement Method using Variable Break in Corpus Based Japanese Text-to-Speech System
Na, Deok-Su
Min, So-Yeon
Lee, Jong-Seok
Bae, Myung-Jin
[J]. JOURNAL OF THE ACOUSTICAL SOCIETY OF KOREA, 2009, 28 (02): : 155 - 163
[2] Phrase break prediction with bidirectional encoder representations in Japanese text-to-speech synthesis
Futamata, Kosuke
Park, Byeongseon
Yamamoto, Ryuichi
Tachibana, Kentaro
[J]. INTERSPEECH 2021, 2021, : 3126 - 3130
[3] JAPANESE TEXT-TO-SPEECH CONVERSION SYSTEM
SATO, H
[J]. REVIEW OF THE ELECTRICAL COMMUNICATIONS LABORATORIES, 1984, 32 (02): : 179 - 187
[4] Data-Driven Phrase Break Prediction for Bengali Text-to-Speech System
Ghosh, Krishnendu
Rao, K. Sreenivasa
[J]. CONTEMPORARY COMPUTING, 2012, 306 : 118 - 129
[5] INTONATIONAL PHRASE BREAK PREDICTION FOR TEXT-TO-SPEECH SYNTHESIS USING DEPENDENCY RELATIONS
Mishra, Taniya
Kim, Yeon-jun
Bangalore, Srinivas
[J]. 2015 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING (ICASSP), 2015, : 4919 - 4923
[6] JAPANESE TEXT-TO-SPEECH SYNTHESIZER
NAGAKURA, K
HAKODA, K
KABEYA, K
[J]. REVIEW OF THE ELECTRICAL COMMUNICATIONS LABORATORIES, 1988, 36 (05): : 451 - 457
[7] Research and Practical Application of AI Anchors Using Japanese Text-to-Speech Method
Kurihara, Kiyoshi
[J]. Kyokai Joho Imeji Zasshi/Journal of the Institute of Image Information and Television Engineers, 2024, 78 (02): : 234 - 242
[8] The pause duration prediction for mandarin text-to-speech system
Yu, J
Tao, JH
[J]. Proceedings of the 2005 IEEE International Conference on Natural Language Processing and Knowledge Engineering (IEEE NLP-KE'05), 2005, : 204 - 208
[9] A method for estimating prosodic symbol from text for Japanese text-to-speech synthesis
Magata, K
Hamagami, T
Komura, M
[J]. ICSLP 96 - FOURTH INTERNATIONAL CONFERENCE ON SPOKEN LANGUAGE PROCESSING, PROCEEDINGS, VOLS 1-4, 1996, : 1373 - 1376
[10] Using pitch accenting to improve Japanese text-to-speech understanding
Yu, WW
Yokoi, H
Kakazu, Y
Tamura, T
[J]. PROCEEDINGS OF THE 26TH ANNUAL INTERNATIONAL CONFERENCE OF THE IEEE ENGINEERING IN MEDICINE AND BIOLOGY SOCIETY, VOLS 1-7, 2004, 26 : 4556 - 4559

← 1 2 3 4 5 →