INTONATIONAL PHRASE BREAK PREDICTION FOR TEXT-TO-SPEECH SYNTHESIS USING DEPENDENCY RELATIONS

被引:0
|
作者
Mishra, Taniya [1 ]
Kim, Yeon-jun [1 ]
Bangalore, Srinivas [1 ]
机构
[1] Interactions, 31 Hayward St, Franklin, MA 02038 USA
关键词
Intonational phrase; phrase breaks; IP prediction; prosody; text-analysis;
D O I
暂无
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
Intonational phrase (IP) break prediction is an important aspect of front-end analysis in a text-to-speech system. Standard approaches for intonational phrase break prediction rely on the use of linguistic rules or more recently, lexicalized data-driven models. Linguistic rules are not robust while data-driven models based on lexical identity do not generalize across domains. To overcome these challenges, in this paper, we explore the use of syntactic features to predict intonational phrase breaks. On a test set of over 40 thousand words, while a lexically driven IP break prediction model yields an F-score of 0.82, a non-lexicalized model that uses part-of-speech tags and dependency relations achieves an F-score of 0.81 with added feature of being more portable across domains. In this work, we also examine the effect of contextual information on prediction performance. Our evaluation shows that using a three-token left context in a POS-tag based model results in only a 2% drop in recall compared to a model that uses both a left and right context, which suggests the viability of using such a model for incremental text-to-speech system.
引用
收藏
页码:4919 / 4923
页数:5
相关论文
共 50 条
  • [1] Phrase break prediction with bidirectional encoder representations in Japanese text-to-speech synthesis
    Futamata, Kosuke
    Park, Byeongseon
    Yamamoto, Ryuichi
    Tachibana, Kentaro
    INTERSPEECH 2021, 2021, : 3126 - 3130
  • [2] Data-Driven Phrase Break Prediction for Bengali Text-to-Speech System
    Ghosh, Krishnendu
    Rao, K. Sreenivasa
    CONTEMPORARY COMPUTING, 2012, 306 : 118 - 129
  • [3] ENGLISH NOUN PHRASE ACCENT PREDICTION FOR TEXT-TO-SPEECH
    SPROAT, R
    COMPUTER SPEECH AND LANGUAGE, 1994, 8 (02): : 79 - 94
  • [4] Speaker Specific Phrase Break Modeling with Conditional Random Fields for Text-to-Speech
    Louw, Johannes A.
    Moodley, Avashlin
    2016 PATTERN RECOGNITION ASSOCIATION OF SOUTH AFRICA AND ROBOTICS AND MECHATRONICS INTERNATIONAL CONFERENCE (PRASA-ROBMECH), 2016,
  • [5] Text chunking for intonational phrase prediction in Chinese
    Li, JF
    Fan, M
    Hu, GP
    Wang, RH
    2003 INTERNATIONAL CONFERENCE ON NATURAL LANGUAGE PROCESSING AND KNOWLEDGE ENGINEERING, PROCEEDINGS, 2003, : 231 - 237
  • [6] A Variable Break Prediction Method Using CART in a Japanese Text-to-Speech System
    Na, Deok-Su
    Bae, Myung-Jin
    IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, 2009, E92D (02): : 349 - 352
  • [7] Gemination prediction using DNN for Arabic text-to-speech synthesis
    Ali, Ikbel Hadj
    Mnasri, Zied
    Laachri, Zied
    2019 16TH INTERNATIONAL MULTI-CONFERENCE ON SYSTEMS, SIGNALS & DEVICES (SSD), 2019, : 366 - 370
  • [8] A fast stochastic parser for determining phrase boundaries for text-to-speech synthesis
    Sharman, RA
    Wright, JH
    1996 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, CONFERENCE PROCEEDINGS, VOLS 1-6, 1996, : 357 - 360
  • [9] A data-driven framework for intonational phrase break prediction
    Maragoudakis, M
    Zervas, P
    Fakotakis, N
    Kokkinakis, G
    TEXT, SPEECH AND DIALOGUE, PROCEEDINGS, 2003, 2807 : 189 - 197
  • [10] Unit generation based on phrase break strength and pruning for corpus-based text-to-speech
    Kim, S
    Lee, Y
    Hirose, K
    ETRI JOURNAL, 2001, 23 (04) : 168 - 176