Learning Prosodic Patterns for Mandarin Speech Synthesis

被引:0
|
作者
Yiqiang Chen
Wen Gao
Tingshao Zhu
Charles Ling
机构
[1] Chinese Academy of Sciences,Institute of Computing Technology
[2] University of Alberta,Department of Computing Science
[3] Edmonton,Department of Computer Science
[4] University of West Ontario,undefined
关键词
TTS; clustering; Rough Set; ANN; Decision tree; Beyesian network;
D O I
暂无
中图分类号
学科分类号
摘要
Higher quality synthesized speech is required for widespread use of text-to-speech (TTS) technology, and the prosodic pattern is the key feature that makes synthetic speech sound unnatural and monotonous, which mainly describes the variation of pitch. The rules used in most Chinese TTS systems are constructed by experts, with weak quality control and low precision. In this paper, we propose a combination of clustering and machine learning techniques to extract prosodic patterns from actual large mandarin speech databases to improve the naturalness and intelligibility of synthesized speech. Typical prosody models are found by clustering analysis. Some machine learning techniques, including Rough Set, Artificial Neural Network (ANN) and Decision tree, are trained for fundamental frequency and energy contours, which can be directly used in a pitch-synchronous-overlap-add-based (PSOLA-based) TTS system. The experimental results showed that synthesized prosodic features greatly resembled their original counterparts for most syllables.
引用
收藏
页码:95 / 109
页数:14
相关论文
共 50 条
  • [1] Learning prosodic patterns for mandarin speech synthesis
    Chen, YQ
    Gao, W
    Zhu, TS
    Ling, C
    [J]. JOURNAL OF INTELLIGENT INFORMATION SYSTEMS, 2002, 19 (01) : 95 - 109
  • [2] Extracting mandarin prosodic patterns by machine learning
    Zhu, Ting-Shao
    Gao, Wen
    [J]. Zidonghua Xuebao/Acta Automatica Sinica, 2001, 27 (06): : 763 - 769
  • [3] Parsing hierarchical prosodic structure for Mandarin speech synthesis
    Xu, Dawei
    Wang, Haifeng
    Li, Guohua
    Kagoshima, Takehiko
    [J]. 2006 IEEE International Conference on Acoustics, Speech and Signal Processing, Vols 1-13, 2006, : 745 - 748
  • [4] Development of prosodic patterns in Mandarin-learning infants
    Chen, Li-Mei
    Kent, Raymond D.
    [J]. JOURNAL OF CHILD LANGUAGE, 2009, 36 (01) : 73 - 84
  • [5] Exploration of high-level prosodic patterns for continuous Mandarin speech
    Chiang, Chen-Yu
    Yu, Hsiu-Min
    Wang, Yih-Ru
    Chen, Sin-Horng
    [J]. 2008 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING, VOLS 1-12, 2008, : 3977 - +
  • [6] Automatic generation of prosodic structure for high quality Mandarin speech synthesis
    Chou, FC
    Tseng, CY
    Lee, LS
    [J]. ICSLP 96 - FOURTH INTERNATIONAL CONFERENCE ON SPOKEN LANGUAGE PROCESSING, PROCEEDINGS, VOLS 1-4, 1996, : 1624 - 1627
  • [7] Prosodic Cues in Polite and Rude Mandarin Speech
    Fan, Ping
    Gu, Wentao
    [J]. 2016 10TH INTERNATIONAL SYMPOSIUM ON CHINESE SPOKEN LANGUAGE PROCESSING (ISCSLP), 2016,
  • [8] An approach to constructing prosodic grammar for Mandarin read speech
    Hong, Yu-Siang
    Chiang, Chen-Yu
    Wang, Yih-Ru
    Chen, Sin-Horng
    [J]. JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA, 2023, 153 (04): : 2406 - 2425
  • [9] Prosodic Modeling in Large Vocabulary Mandarin Speech Recognition
    Huang, Jui-Ting
    Lee, Lin-shan
    [J]. INTERSPEECH 2006 AND 9TH INTERNATIONAL CONFERENCE ON SPOKEN LANGUAGE PROCESSING, VOLS 1-5, 2006, : 1241 - 1244
  • [10] PROSODIC FEATURES OF MANDARIN REPAIR IN CLASSROOM LECTURE SPEECH
    Chen, Helen Kai-yun
    Fang, Wei-te
    Tseng, Chiu-yu
    [J]. 2014 17TH ORIENTAL CHAPTER OF THE INTERNATIONAL COMMITTEE FOR THE CO-ORDINATION AND STANDARDIZATION OF SPEECH DATABASES AND ASSESSMENT TECHNIQUES (COCOSDA), 2014,