A HMM-based Mandarin Chinese Singing Voice Synthesis System

被引:0
|
作者
Xian Li [1 ]
Zengfu Wang [2 ]
机构
[1] the Department of Automation, University of Science and Technology of China
[2] the Institute of Intelligent Machines, Chinese Academy of Sciences
关键词
Singing voice synthesis; melisma; discrete cosine transform(DCT);
D O I
暂无
中图分类号
TN912.33 [语音合成];
学科分类号
0711 ;
摘要
We propose a mandarin Chinese singing voice synthesis system, in which hidden Markov model(HMM)-based speech synthesis technique is used. A mandarin Chinese singing voice corpus is recorded and musical contextual features are well designed for training. F0 and spectrum of singing voice are simultaneously modeled with context-dependent HMMs. There is a new problem, F0 of singing voice is always sparse because of large amount of context, i.e., tempo and pitch of note, key, time signature and etc. So the features hardly ever appeared in the training data cannot be well obtained. To address this problem,difference between F0 of singing voice and that of musical score(DF0) is modeled by a single Viterbi training. To overcome the over-smoothing of the generated F0 contour, syllable level F0 model based on discrete cosine transforms(DCT) is applied, F0 contour is generated by integrating two-level statistical models.The experimental results demonstrate that the proposed system outperforms the baseline system in both objective and subjective evaluations. The proposed system can generate a more natural F0 contour. Furthermore, the syllable level F0 model can make singing voice more expressive.
引用
收藏
页码:192 / 202
页数:11
相关论文
共 50 条
  • [1] A HMM-based Mandarin Chinese Singing Voice Synthesis System
    Li, Xian
    Wang, Zengfu
    [J]. IEEE-CAA JOURNAL OF AUTOMATICA SINICA, 2016, 3 (02) : 192 - 202
  • [2] An HMM-based Singing Voice Synthesis System
    Saino, Keijiro
    Zen, Heiga
    Nankaku, Yoshihiko
    Lee, Akinobu
    Tokuda, Keiichi
    [J]. INTERSPEECH 2006 AND 9TH INTERNATIONAL CONFERENCE ON SPOKEN LANGUAGE PROCESSING, VOLS 1-5, 2006, : 2274 - 2277
  • [3] PITCH ADAPTIVE TRAINING FOR HMM-BASED SINGING VOICE SYNTHESIS
    Oura, Keiichiro
    Mase, Ayami
    Nankaku, Yoshihiko
    Tokuda, Keiichi
    [J]. 2012 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2012, : 5377 - 5380
  • [4] HMM-BASED SINGING VOICE SYNTHESIS AND ITS APPLICATION TO JAPANESE AND ENGLISH
    Nakamura, Kazuhiro
    Oura, Keiichiro
    Nankaku, Yoshihiko
    Tokuda, Keiichi
    [J]. 2014 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2014,
  • [5] An HMM-based Mandarin Chinese Text-to-Speech system
    Qian, Yao
    Soong, Frank
    Chen, Yining
    Chu, Min
    [J]. CHINESE SPOKEN LANGUAGE PROCESSING, PROCEEDINGS, 2006, 4274 : 223 - +
  • [6] HMM-based expressive singing voice synthesis with singing style control and robust pitch modeling
    Nose, Takashi
    Kanemoto, Misa
    Koriyama, Tomoki
    Kobayashi, Takao
    [J]. COMPUTER SPEECH AND LANGUAGE, 2015, 34 (01): : 308 - 322
  • [7] Factored Maximum Likelihood Kernelized Regression for HMM-based Singing Voice Synthesis
    Sung, June Sig
    Hong, Doo Hwa
    Koo, Hyun Woo
    Kim, Nam Soo
    [J]. 14TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2013), VOLS 1-5, 2013, : 359 - 363
  • [8] INTEGRATION OF SPEAKER AND PITCH ADAPTIVE TRAINING FOR HMM-BASED SINGING VOICE SYNTHESIS
    Shirota, Kanako
    Nakamura, Kazuhiro
    Hashimoto, Kei
    Oura, Keiichiro
    Nankaku, Yoshihiko
    Tokuda, Keiichi
    [J]. 2014 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2014,
  • [9] HMM-based synthesis of creaky voice
    Raitio, Tuomo
    Kane, John
    Drugman, Thomas
    Gobl, Christer
    [J]. 14TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2013), VOLS 1-5, 2013, : 2315 - +
  • [10] HMM-based singing voice synthesis system using pitch-shifted pseudo training data
    Mase, Ayami
    Oura, Keiichiro
    Nankaku, Yoshihiko
    Tokuda, Keiichi
    [J]. 11TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2010 (INTERSPEECH 2010), VOLS 1-2, 2010, : 845 - 848