INTONATIONAL PHRASE BREAK PREDICTION FOR TEXT-TO-SPEECH SYNTHESIS USING DEPENDENCY RELATIONS

被引:0
|
作者
Mishra, Taniya [1 ]
Kim, Yeon-jun [1 ]
Bangalore, Srinivas [1 ]
机构
[1] Interactions, 31 Hayward St, Franklin, MA 02038 USA
关键词
Intonational phrase; phrase breaks; IP prediction; prosody; text-analysis;
D O I
暂无
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
Intonational phrase (IP) break prediction is an important aspect of front-end analysis in a text-to-speech system. Standard approaches for intonational phrase break prediction rely on the use of linguistic rules or more recently, lexicalized data-driven models. Linguistic rules are not robust while data-driven models based on lexical identity do not generalize across domains. To overcome these challenges, in this paper, we explore the use of syntactic features to predict intonational phrase breaks. On a test set of over 40 thousand words, while a lexically driven IP break prediction model yields an F-score of 0.82, a non-lexicalized model that uses part-of-speech tags and dependency relations achieves an F-score of 0.81 with added feature of being more portable across domains. In this work, we also examine the effect of contextual information on prediction performance. Our evaluation shows that using a three-token left context in a POS-tag based model results in only a 2% drop in recall compared to a model that uses both a left and right context, which suggests the viability of using such a model for incremental text-to-speech system.
引用
收藏
页码:4919 / 4923
页数:5
相关论文
共 50 条
  • [21] Improving text-to-speech synthesis
    Tatham, M
    Lewis, E
    ICSLP 96 - FOURTH INTERNATIONAL CONFERENCE ON SPOKEN LANGUAGE PROCESSING, PROCEEDINGS, VOLS 1-4, 1996, : 1856 - 1859
  • [22] Issues in text-to-speech synthesis
    Macchi, M
    IEEE INTERNATIONAL JOINT SYMPOSIA ON INTELLIGENCE AND SYSTEMS - PROCEEDINGS, 1998, : 318 - 325
  • [23] Design and evaluation of a phonological phrase parser for Spanish text-to-speech
    Karn, HE
    ICSLP 96 - FOURTH INTERNATIONAL CONFERENCE ON SPOKEN LANGUAGE PROCESSING, PROCEEDINGS, VOLS 1-4, 1996, : 1696 - 1699
  • [24] Semantic dependency and local convolution for enhancing naturalness and tone in text-to-speech synthesis
    Jiang, Chenglong
    Gao, Ying
    Ng, Wing W. Y.
    Zhou, Jiyong
    Zhong, Jinghui
    Zhen, Hongzhong
    Hu, Xiping
    NEUROCOMPUTING, 2024, 608
  • [25] Multilingual text analysis for text-to-speech synthesis
    Bell Lab, Murray Hill, United States
    International Conference on Spoken Language Processing, ICSLP, Proceedings, 1996, 3 : 1365 - 1368
  • [26] Multilingual text analysis for text-to-speech synthesis
    Sproat, R
    ICSLP 96 - FOURTH INTERNATIONAL CONFERENCE ON SPOKEN LANGUAGE PROCESSING, PROCEEDINGS, VOLS 1-4, 1996, : 1365 - 1368
  • [27] Two-Stage Prosody Prediction for Emotional Text-to-Speech Synthesis
    Tang, Hao
    Zhou, Xi
    Odisio, Matthias
    Hasegawa-Johnson, Mark
    Huang, Thomas S.
    INTERSPEECH 2008: 9TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2008, VOLS 1-5, 2008, : 2138 - 2141
  • [28] A hybrid model for text-to-speech synthesis
    Violaro, F
    Boeffard, O
    IEEE TRANSACTIONS ON SPEECH AND AUDIO PROCESSING, 1998, 6 (05): : 426 - 434
  • [29] Environment Aware Text-to-Speech Synthesis
    Tan, Daxin
    Zhang, Guangyan
    Lee, Tan
    INTERSPEECH 2022, 2022, : 481 - 485
  • [30] Text-to-speech synthesis integrated circuit
    Baskaya, IF
    Aktan, O
    Dündar, G
    PROCEEDINGS OF THE IEEE 12TH SIGNAL PROCESSING AND COMMUNICATIONS APPLICATIONS CONFERENCE, 2004, : 653 - 656