Statistical parametric speech synthesis using a hidden trajectory model

被引:2
|
作者
Cai, Ming-Qi [1 ]
Ling, Zhen-Hua [1 ]
Dai, Li-Rong [1 ]
机构
[1] Univ Sci & Technol China, Natl Engn Lab Speech & Language Informat Proc, Hefei 230026, Anhui, Peoples R China
基金
中国国家自然科学基金;
关键词
Speech synthesis; Hidden Markov model; Hidden trajectory model; Speech production; EMOTIONAL EXPRESSIONS; SPEAKING STYLES; GENERATION;
D O I
10.1016/j.specom.2015.05.008
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
A novel spectral modeling method for statistical parametric speech synthesis using a hidden trajectory model (HTM) is presented in this paper. An HTM is a structured generative model with a two-stage implementation. First hidden formant trajectories are generated from time-aligned formant target sequences by a bidirectional filter. This target-filtering model could provide a correlation structure across temporal frames and describe the effect of co-articulation on speech signals efficiently. Then the observed cepstral features are constituted by a formant-related component and a residual component. The formant-related component is predicted from hidden formant trajectories using a nonlinear and analytical function, and the prediction residuals are modeled by context-dependent Gaussians. In this paper, we apply HTM-based acoustic modeling to speech synthesis and investigate the effectiveness of this method in improving the naturalness and controllability of synthetic speech. Experimental results show that this proposed method can improve the accuracy of spectral feature prediction and the naturalness of synthetic speech compared with the conventional HMM-based method, especially for the conditions where the amount of training data is limited. Furthermore, this method can achieve effective controllability on vowel quality and formant sharpness of synthetic speech by conveniently manipulating the distribution parameters for the phone-dependent targets of formant frequencies and bandwidths. (C) 2015 Elsevier B.V. All rights reserved.
引用
收藏
页码:149 / 159
页数:11
相关论文
共 50 条
  • [1] Hidden-Markov-model based statistical parametric speech synthesis for Marathi with optimal number of hidden states
    Suraj Pandurang Patil
    Swapnil Laxman Lahudkar
    [J]. International Journal of Speech Technology, 2019, 22 : 93 - 98
  • [2] Hidden-Markov-model based statistical parametric speech synthesis for Marathi with optimal number of hidden states
    Patil, Suraj Pandurang
    Lahudkar, Swapnil Laxman
    [J]. INTERNATIONAL JOURNAL OF SPEECH TECHNOLOGY, 2019, 22 (01) : 93 - 98
  • [3] A Continuous Vocoder Using Sinusoidal Model for Statistical Parametric Speech Synthesis
    Al-Radhi, Mohammed Salah
    Csapo, Tamas Gabor
    Nemeth, Geza
    [J]. SPEECH AND COMPUTER (SPECOM 2018), 2018, 11096 : 11 - 20
  • [4] Statistical parametric speech synthesis
    Black, Alan W.
    Zen, Heiga
    Tokuda, Keiichi
    [J]. 2007 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOL IV, PTS 1-3, 2007, : 1229 - +
  • [5] Statistical parametric speech synthesis
    Zen, Heiga
    Tokuda, Keiichi
    Black, Alan W.
    [J]. SPEECH COMMUNICATION, 2009, 51 (11) : 1039 - 1064
  • [6] Context-dependent acoustic modeling based on hidden maximum entropy model for statistical parametric speech synthesis
    Khorram, Soheil
    Sameti, Hossein
    Bahmaninezhad, Fahimeh
    King, Simon
    Drugman, Thomas
    [J]. EURASIP JOURNAL ON AUDIO SPEECH AND MUSIC PROCESSING, 2014,
  • [7] Context-dependent acoustic modeling based on hidden maximum entropy model for statistical parametric speech synthesis
    Soheil Khorram
    Hossein Sameti
    Fahimeh Bahmaninezhad
    Simon King
    Thomas Drugman
    [J]. EURASIP Journal on Audio, Speech, and Music Processing, 2014
  • [8] Extracting Spectral Features Using Deep Autoencoders With Binary Distributed Hidden Units for Statistical Parametric Speech Synthesis
    Hu, Ya-Jun
    Ling, Zhen-Hua
    [J]. IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2018, 26 (04) : 713 - 724
  • [9] Speaker Adaptation for Slovak Statistical Parametric Speech Synthesis Based on Hidden Markov Models
    Sulir, Martin
    Juhar, Jozef
    [J]. 2015 25TH INTERNATIONAL CONFERENCE RADIOELEKTRONIKA (RADIOELEKTRONIKA), 2015, : 137 - 140
  • [10] STATISTICAL PARAMETRIC SPEECH SYNTHESIS USING DEEP NEURAL NETWORKS
    Zen, Heiga
    Senior, Andrew
    Schuster, Mike
    [J]. 2013 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2013, : 7962 - 7966