INTONATIONAL PHRASE BREAK PREDICTION FOR TEXT-TO-SPEECH SYNTHESIS USING DEPENDENCY RELATIONS

被引：0

作者：

Mishra, Taniya ^{[1
]}

Kim, Yeon-jun ^{[1
]}

Bangalore, Srinivas ^{[1
]}

机构：

[1] Interactions, 31 Hayward St, Franklin, MA 02038 USA

来源：

2015 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING (ICASSP) | 2015年

关键词：

Intonational phrase; phrase breaks; IP prediction; prosody; text-analysis;

D O I：

暂无

中图分类号：

O42 [声学];

学科分类号：

070206 ; 082403 ;

摘要：

Intonational phrase (IP) break prediction is an important aspect of front-end analysis in a text-to-speech system. Standard approaches for intonational phrase break prediction rely on the use of linguistic rules or more recently, lexicalized data-driven models. Linguistic rules are not robust while data-driven models based on lexical identity do not generalize across domains. To overcome these challenges, in this paper, we explore the use of syntactic features to predict intonational phrase breaks. On a test set of over 40 thousand words, while a lexically driven IP break prediction model yields an F-score of 0.82, a non-lexicalized model that uses part-of-speech tags and dependency relations achieves an F-score of 0.81 with added feature of being more portable across domains. In this work, we also examine the effect of contextual information on prediction performance. Our evaluation shows that using a three-token left context in a POS-tag based model results in only a 2% drop in recall compared to a model that uses both a left and right context, which suggests the viability of using such a model for incremental text-to-speech system.

引用

页码：4919 / 4923

页数：5

共 50 条

[21] Improving text-to-speech synthesis
Tatham, M
Lewis, E
ICSLP 96 - FOURTH INTERNATIONAL CONFERENCE ON SPOKEN LANGUAGE PROCESSING, PROCEEDINGS, VOLS 1-4, 1996, : 1856 - 1859
[22] Issues in text-to-speech synthesis
Macchi, M
IEEE INTERNATIONAL JOINT SYMPOSIA ON INTELLIGENCE AND SYSTEMS - PROCEEDINGS, 1998, : 318 - 325
[23] Design and evaluation of a phonological phrase parser for Spanish text-to-speech
Karn, HE
ICSLP 96 - FOURTH INTERNATIONAL CONFERENCE ON SPOKEN LANGUAGE PROCESSING, PROCEEDINGS, VOLS 1-4, 1996, : 1696 - 1699
[24] Semantic dependency and local convolution for enhancing naturalness and tone in text-to-speech synthesis
Jiang, Chenglong
Gao, Ying
Ng, Wing W. Y.
Zhou, Jiyong
Zhong, Jinghui
Zhen, Hongzhong
Hu, Xiping
NEUROCOMPUTING, 2024, 608
[25] Multilingual text analysis for text-to-speech synthesis
Bell Lab, Murray Hill, United States
International Conference on Spoken Language Processing, ICSLP, Proceedings, 1996, 3 : 1365 - 1368
[26] Multilingual text analysis for text-to-speech synthesis
Sproat, R
ICSLP 96 - FOURTH INTERNATIONAL CONFERENCE ON SPOKEN LANGUAGE PROCESSING, PROCEEDINGS, VOLS 1-4, 1996, : 1365 - 1368
[27] Two-Stage Prosody Prediction for Emotional Text-to-Speech Synthesis
Tang, Hao
Zhou, Xi
Odisio, Matthias
Hasegawa-Johnson, Mark
Huang, Thomas S.
INTERSPEECH 2008: 9TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2008, VOLS 1-5, 2008, : 2138 - 2141
[28] A hybrid model for text-to-speech synthesis
Violaro, F
Boeffard, O
IEEE TRANSACTIONS ON SPEECH AND AUDIO PROCESSING, 1998, 6 (05): : 426 - 434
[29] Environment Aware Text-to-Speech Synthesis
Tan, Daxin
Zhang, Guangyan
Lee, Tan
INTERSPEECH 2022, 2022, : 481 - 485
[30] Text-to-speech synthesis integrated circuit
Baskaya, IF
Aktan, O
Dündar, G
PROCEEDINGS OF THE IEEE 12TH SIGNAL PROCESSING AND COMMUNICATIONS APPLICATIONS CONFERENCE, 2004, : 653 - 656

← 1 2 3 4 5 →