AUTOMATIC SYNTHESIS UNIT GENERATION FOR ENGLISH SPEECH SYNTHESIS BASED ON MULTILAYERED CONTEXT ORIENTED CLUSTERING

被引:0
|
作者
NAKAJIMA, S
机构
[1] Speech and Acoustics Laboratory, NTT Human Interface Laboratories 1-2356 Take, Yokosuka, Kanagawa
关键词
Context dependent unit; Multi-lingual speech synthesis; Phonetic context; Speech synthesis;
D O I
10.1016/0167-6393(94)90025-6
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
In this paper, we propose a new synthesis unit learning method aiming at multi-lingual speech synthesis and describe its application to English speech synthesis. The method termed Multi-Layered Context Oriented Clustering (ML-COC) is a generalized framework of the COC method which has been applied to Japanese speech synthesis. The conventional COC method produces a set of phonetic context dependent units through a cluster splitting process. In ML-COC, the notion of context is generalized and the factors other than phonetic context, such as stressing and syntactical boundaries, are taken into account to capture the richer phoneme variations of English. A synthesis unit generation experiment shows that ML-COC produces about three times as many synthesis units as the conventional COC (Single-Layered COC: SL-COC) method, and the average intra-cluster variance of ML-COC units is 20% lower than that of SL-COC. These results suggest that the ML-COC synthesis units reflect the phonological structure of English much more appropriately than do the SL-COC units. To validate the effectiveness of the ML-COC method, we conducted preference experiments using synthesized speech. The preference test exposed 10 subjects to 52 sentences. The ML-COC method was preferred over the conventional SL-COC method by a score of 70% to 30%.
引用
收藏
页码:313 / 324
页数:12
相关论文
共 50 条
  • [1] Automatic Viseme Clustering for Audiovisual Speech Synthesis
    Mattheyses, Wesley
    Latacz, Lukas
    Verhelst, Werner
    [J]. 12TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2011 (INTERSPEECH 2011), VOLS 1-5, 2011, : 2184 - 2187
  • [2] Automatic generation of speech synthesis units based on closed loop training
    Kagoshima, T
    Akamine, M
    [J]. 1997 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOLS I - V: VOL I: PLENARY, EXPERT SUMMARIES, SPECIAL, AUDIO, UNDERWATER ACOUSTICS, VLSI; VOL II: SPEECH PROCESSING; VOL III: SPEECH PROCESSING, DIGITAL SIGNAL PROCESSING; VOL IV: MULTIDIMENSIONAL SIGNAL PROCESSING, NEURAL NETWORKS - VOL V: STATISTICAL SIGNAL AND ARRAY PROCESSING, APPLICATIONS, 1997, : 963 - 966
  • [3] Minimum generation error based optimization of HMM model clustering for speech synthesis
    Lu, Heng
    Ling, Zhen-Hua
    Lei, Ming
    Dai, Li-Rong
    Wang, Ren-Hua
    [J]. Moshi Shibie yu Rengong Zhineng/Pattern Recognition and Artificial Intelligence, 2010, 23 (06): : 822 - 828
  • [4] Audio and Visual Exaggerated Expressive Speech Generation of English Language Learning Based on Automatic Context Algorithm
    Huang, Jie
    Gong, Xun
    [J]. IWCMC 2021: 2021 17TH INTERNATIONAL WIRELESS COMMUNICATIONS & MOBILE COMPUTING CONFERENCE (IWCMC), 2021, : 1774 - 1777
  • [5] PERCEPTUAL CLUSTERING BASED UNIT SELECTION OPTIMIZATION FOR CONCATENATIVE TEXT-TO-SPEECH SYNTHESIS
    Jiang, Tao
    Wu, Zhiyong
    Jia, Jia
    Cai, Lianhong
    [J]. 2012 8TH INTERNATIONAL SYMPOSIUM ON CHINESE SPOKEN LANGUAGE PROCESSING, 2012, : 64 - 68
  • [6] AUTOMATIC SPEECH SYNTHESIS
    OSHAUGHNESSY, D
    [J]. IEEE COMMUNICATIONS MAGAZINE, 1983, 21 (09) : 26 - 34
  • [8] Tree-based Context Clustering Using Speech Recognition Features for Acoustic Model Training of Speech Synthesis
    Chanjaradwichai, Supadaech
    Suchato, Atiwong
    Punyabukkana, Proadpran
    [J]. 2015 12TH INTERNATIONAL CONFERENCE ON ELECTRICAL ENGINEERING/ELECTRONICS, COMPUTER, TELECOMMUNICATIONS AND INFORMATION TECHNOLOGY (ECTI-CON), 2015,
  • [9] Automatic Prosody Generation for Serbo-Croatian Speech Synthesis Based on Regression Trees
    Secujski, Milan
    Pekar, Darko
    Jakovljevic, Niksa
    [J]. 12TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2011 (INTERSPEECH 2011), VOLS 1-5, 2011, : 3164 - +
  • [10] Improvements to HMM-Based Speech Synthesis Based on Parameter Generation with Rich Context Models
    Takamichi, Shinnosuke
    Toda, Tomoki
    Shiga, Yoshinori
    Sakti, Sakriani
    Neubig, Graham
    Nakamura, Satoshi
    [J]. 14TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2013), VOLS 1-5, 2013, : 364 - 368