EFFICIENT GRADIENT F0 TREE MODEL FOR PROSODY MODELING AND UNIT-SELECTION, APPLIED FOR THE EMBEDDED US ENGLISH CONCATENATIVE TTS

被引：1

作者：

Shechtman, Slava ^{[1
]}

Tachibana, Ryuki ^{[2
]}

机构：

[1] IBM Res Corp, Haifa Res Lab, Haifa, Israel

[2] IBM Res, Tokyo Res Lab, Kanagawa, Japan

来源：

2009 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOLS 1- 8, PROCEEDINGS | 2009年

关键词：

speech synthesis; unit selection; prosody modeling; F0; modeling; embedded TTS; SPEECH SYNTHESIS SYSTEM;

D O I：

10.1109/ICASSP.2009.4960567

中图分类号：

O42 [声学];

学科分类号：

070206 ; 082403 ;

摘要：

Modeling of pitch dynamics in addition to absolute pitch modeling is highly desirable for robust pitch curve prediction and unit selection in concatenative TTS systems. Transition prosody models have been reported to improve consistency and naturalness for pitch-accent and tonal languages, like Japanese and Mandarin. In the current work we revise a Gradient F0 tree model, originally developed for Japanese, and adjust it for American English. The resultant model requires few computational resources at a runtime that makes it highly suitable for embedded TTS applications. We report encouraging results of applying it for an embedded concatenative TTS system for American English.

引用

页码：4249 / +

页数：2