A Variable Break Prediction Method Using CART in a Japanese Text-to-Speech System

被引:1
|
作者
Na, Deok-Su [1 ]
Bae, Myung-Jin [2 ]
机构
[1] Voiceware Co Ltd, Seoul 133120, South Korea
[2] Soongsil Univ, Dept Informat & Telecommun Engn, Seoul, South Korea
来源
关键词
text-to-speech system; break prediction; variable break;
D O I
10.1587/transinf.E92.D.349
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Break prediction is an important step in text-to-speech systems as break indices (BIs) have a great influence on how to correctly represent prosodic phrase boundaries. However, an accurate prediction is difficult since BIs are often chosen according to the meaning of a sentence or the reading style of the speaker. In Japanese, the prediction of an accentual phrase boundary (APB) and major phrase boundary (MPB) is particularly difficult. Thus, this paper presents a method to complement the prediction errors of an APB and MPB. First, we define a subtle BI in which it is difficult to decide between an APB and MPB clearly as a variable break (VB), and an explicit BI as a fixed break (FB). The VB is chosen using the classification and regression tree, and multiple prosodic targets in relation to the pith and duration are then generated. Finally, unit-selection is conducted using multiple prosodic targets. The experimental results show that the proposed method improves the naturalness of synthesized speech.
引用
收藏
页码:349 / 352
页数:4
相关论文
共 50 条
  • [1] A Performance Improvement Method using Variable Break in Corpus Based Japanese Text-to-Speech System
    Na, Deok-Su
    Min, So-Yeon
    Lee, Jong-Seok
    Bae, Myung-Jin
    [J]. JOURNAL OF THE ACOUSTICAL SOCIETY OF KOREA, 2009, 28 (02): : 155 - 163
  • [2] Phrase break prediction with bidirectional encoder representations in Japanese text-to-speech synthesis
    Futamata, Kosuke
    Park, Byeongseon
    Yamamoto, Ryuichi
    Tachibana, Kentaro
    [J]. INTERSPEECH 2021, 2021, : 3126 - 3130
  • [3] JAPANESE TEXT-TO-SPEECH CONVERSION SYSTEM
    SATO, H
    [J]. REVIEW OF THE ELECTRICAL COMMUNICATIONS LABORATORIES, 1984, 32 (02): : 179 - 187
  • [4] Data-Driven Phrase Break Prediction for Bengali Text-to-Speech System
    Ghosh, Krishnendu
    Rao, K. Sreenivasa
    [J]. CONTEMPORARY COMPUTING, 2012, 306 : 118 - 129
  • [5] INTONATIONAL PHRASE BREAK PREDICTION FOR TEXT-TO-SPEECH SYNTHESIS USING DEPENDENCY RELATIONS
    Mishra, Taniya
    Kim, Yeon-jun
    Bangalore, Srinivas
    [J]. 2015 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING (ICASSP), 2015, : 4919 - 4923
  • [6] JAPANESE TEXT-TO-SPEECH SYNTHESIZER
    NAGAKURA, K
    HAKODA, K
    KABEYA, K
    [J]. REVIEW OF THE ELECTRICAL COMMUNICATIONS LABORATORIES, 1988, 36 (05): : 451 - 457
  • [7] Research and Practical Application of AI Anchors Using Japanese Text-to-Speech Method
    Kurihara, Kiyoshi
    [J]. Kyokai Joho Imeji Zasshi/Journal of the Institute of Image Information and Television Engineers, 2024, 78 (02): : 234 - 242
  • [8] The pause duration prediction for mandarin text-to-speech system
    Yu, J
    Tao, JH
    [J]. Proceedings of the 2005 IEEE International Conference on Natural Language Processing and Knowledge Engineering (IEEE NLP-KE'05), 2005, : 204 - 208
  • [9] A method for estimating prosodic symbol from text for Japanese text-to-speech synthesis
    Magata, K
    Hamagami, T
    Komura, M
    [J]. ICSLP 96 - FOURTH INTERNATIONAL CONFERENCE ON SPOKEN LANGUAGE PROCESSING, PROCEEDINGS, VOLS 1-4, 1996, : 1373 - 1376
  • [10] Using pitch accenting to improve Japanese text-to-speech understanding
    Yu, WW
    Yokoi, H
    Kakazu, Y
    Tamura, T
    [J]. PROCEEDINGS OF THE 26TH ANNUAL INTERNATIONAL CONFERENCE OF THE IEEE ENGINEERING IN MEDICINE AND BIOLOGY SOCIETY, VOLS 1-7, 2004, 26 : 4556 - 4559