Constructing a phonetic-rich speech corpus while controlling time-dependent voice quality variability for English speech synthesis

被引:0
|
作者
Ni, Jinfu
Hirai, Toshio
Kawai, Hisashi
机构
关键词
D O I
暂无
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
This paper presents a practical approach to constructing a large-scale speech corpus for corpus-based speech synthesis. This consists of (1) selecting a source text corpus that fits limited target domains; (2) analyzing the source text corpus to obtain the unit statistics; (3) automatically extracting prompt subjects (sentences) from the source text corpus to maximize the intended unit coverage with the given amount of text; and (4) recording prompt subjects while controlling such critical factors that cause undesirable voice variability. This paper describes related computational methods, such as a greedy algorithm for prompt selection, the proximity effects found in a real recording system, and a technique for detecting the time-dependent voice variations. While the approach is demonstrated in English, it is also promising for other languages.
引用
收藏
页码:881 / 884
页数:4
相关论文
共 2 条
  • [1] A study on time-dependent voice quality variation in a large-scale single speaker speech corpus used for speech synthesis
    Kawai, H
    Tsuzaki, M
    PROCEEDINGS OF THE 2002 IEEE WORKSHOP ON SPEECH SYNTHESIS, 2002, : 15 - 18
  • [2] Subsegmental Representation in Child Speech Production: Structured Variability of Stop Consonant Voice Onset Time in American English and Cantonese
    Chodroff, Eleanor
    Bradshaw, Leah
    Livesay, Vivian
    JOURNAL OF CHILD LANGUAGE, 2023, 50 (05) : 1245 - 1273