Data-Driven Phrase Break Prediction for Bengali Text-to-Speech System

被引:0
|
作者
Ghosh, Krishnendu [1 ]
Rao, K. Sreenivasa [1 ]
机构
[1] Indian Inst Technol, Sch Informat Technol, Kharagpur 721302, W Bengal, India
来源
CONTEMPORARY COMPUTING | 2012年 / 306卷
关键词
Phrase break prediction; morphological; positional and structural features; CART; FFNN;
D O I
暂无
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
In this paper, an approach is proposed to accurately predict the locations of phrase breaks in a sentence for a Bengali text-to-speech (TTS) synthesis system. Determining the positions of phrase breaks is one of the most important tasks for generating natural and intelligible speech. In order to approximate the break locations, a feed-forward neural network (FFNN) based approach is proposed in the current study. For acquiring prosodic phrase break knowledge, morphological information along with widely-used positional and structural features are analyzed. The importance of all the features is demonstrated using a model-dependent feature selection approach. Finally the phrase break predicting model is implemented with the selected optimal set of features and incorporated inside a Bengali TTS system built using Festival framework [1]. The proposed FFNN model is developed using the optimally selected morphological, positional and structural features. The performance of the proposed FFNN model is compared with widely used Classification and Regression Tree (CART) model for prediction of breaks and no-breaks. The FFNN model is evaluated objectively on the basis of precision, recall and a harmonized measure - F score. The significance of the phrase break module is further analyzed by conducting subjective listening tests.
引用
收藏
页码:118 / 129
页数:12
相关论文
共 50 条
  • [1] AN EVALUATION OF MONGOLIAN DATA-DRIVEN TEXT-TO-SPEECH
    Altangerel, Chagnaa
    Purev, Jaimai
    Yesyenbyek, Kerey
    Hansakunbuntheung, Chatchawarn
    [J]. 2013 INTERNATIONAL CONFERENCE ORIENTAL COCOSDA HELD JOINTLY WITH 2013 CONFERENCE ON ASIAN SPOKEN LANGUAGE RESEARCH AND EVALUATION (O-COCOSDA/CASLRE), 2013,
  • [2] INTONATIONAL PHRASE BREAK PREDICTION FOR TEXT-TO-SPEECH SYNTHESIS USING DEPENDENCY RELATIONS
    Mishra, Taniya
    Kim, Yeon-jun
    Bangalore, Srinivas
    [J]. 2015 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING (ICASSP), 2015, : 4919 - 4923
  • [3] Phrase break prediction with bidirectional encoder representations in Japanese text-to-speech synthesis
    Futamata, Kosuke
    Park, Byeongseon
    Yamamoto, Ryuichi
    Tachibana, Kentaro
    [J]. INTERSPEECH 2021, 2021, : 3126 - 3130
  • [4] A data-driven framework for intonational phrase break prediction
    Maragoudakis, M
    Zervas, P
    Fakotakis, N
    Kokkinakis, G
    [J]. TEXT, SPEECH AND DIALOGUE, PROCEEDINGS, 2003, 2807 : 189 - 197
  • [5] Memory-based Data-driven Approach for Grapheme-to-Phoneme Conversion in Bengali Text-to-Speech Synthesis System
    Ghosh, Krishnendu
    Rao, K. Sreenivasa
    [J]. 2011 ANNUAL IEEE INDIA CONFERENCE (INDICON-2011): ENGINEERING SUSTAINABLE SOLUTIONS, 2011,
  • [6] ENGLISH NOUN PHRASE ACCENT PREDICTION FOR TEXT-TO-SPEECH
    SPROAT, R
    [J]. COMPUTER SPEECH AND LANGUAGE, 1994, 8 (02): : 79 - 94
  • [7] Assigning phrase accent to Chinese text-to-speech system
    Qian, Y
    Chen, F
    [J]. 2002 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOLS I-IV, PROCEEDINGS, 2002, : 485 - 488
  • [8] Speaker Specific Phrase Break Modeling with Conditional Random Fields for Text-to-Speech
    Louw, Johannes A.
    Moodley, Avashlin
    [J]. 2016 PATTERN RECOGNITION ASSOCIATION OF SOUTH AFRICA AND ROBOTICS AND MECHATRONICS INTERNATIONAL CONFERENCE (PRASA-ROBMECH), 2016,
  • [9] A Variable Break Prediction Method Using CART in a Japanese Text-to-Speech System
    Na, Deok-Su
    Bae, Myung-Jin
    [J]. IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, 2009, E92D (02): : 349 - 352
  • [10] Data-driven Foot-based Intonation Generator for Text-to-Speech Synthesis
    Langarani, Mahsa Sadat Elyasi
    van Santen, Jan
    Mohammadi, Seyed Hamidreza
    Kain, Alexander
    [J]. 16TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2015), VOLS 1-5, 2015, : 1596 - 1600