This paper presents a practical approach to constructing a large-scale speech corpus for corpus-based speech synthesis. This consists of (1) selecting a source text corpus that fits limited target domains; (2) analyzing the source text corpus to obtain the unit statistics; (3) automatically extracting prompt subjects (sentences) from the source text corpus to maximize the intended unit coverage with the given amount of text; and (4) recording prompt subjects while controlling such critical factors that cause undesirable voice variability. This paper describes related computational methods, such as a greedy algorithm for prompt selection, the proximity effects found in a real recording system, and a technique for detecting the time-dependent voice variations. While the approach is demonstrated in English, it is also promising for other languages.
机构:
ATR, Adv Telecommun Res Inst Int, Spoken Language Translat Res Labs, Seika 6190288, Soraku, South KoreaATR, Adv Telecommun Res Inst Int, Spoken Language Translat Res Labs, Seika 6190288, Soraku, South Korea
Kawai, H
Tsuzaki, M
论文数: 0引用数: 0
h-index: 0
机构:
ATR, Adv Telecommun Res Inst Int, Spoken Language Translat Res Labs, Seika 6190288, Soraku, South KoreaATR, Adv Telecommun Res Inst Int, Spoken Language Translat Res Labs, Seika 6190288, Soraku, South Korea
Tsuzaki, M
PROCEEDINGS OF THE 2002 IEEE WORKSHOP ON SPEECH SYNTHESIS,
2002,
: 15
-
18
机构:
Univ York, Dept Language & Linguist Sci, York YO10 5DD, N Yorkshire, EnglandUniv York, Dept Language & Linguist Sci, York YO10 5DD, N Yorkshire, England
Chodroff, Eleanor
Bradshaw, Leah
论文数: 0引用数: 0
h-index: 0
机构:
Univ Zurich, Inst Computat Linguist, Zurich, SwitzerlandUniv York, Dept Language & Linguist Sci, York YO10 5DD, N Yorkshire, England
Bradshaw, Leah
Livesay, Vivian
论文数: 0引用数: 0
h-index: 0
机构:
Mt Holyoke Coll, Dept Psychol & Educ, S Hadley, MA 01075 USAUniv York, Dept Language & Linguist Sci, York YO10 5DD, N Yorkshire, England