Articulatory Control of HMM-Based Parametric Speech Synthesis Using Feature-Space-Switched Multiple Regression

被引:35
|
作者
Ling, Zhen-Hua [1 ]
Richmond, Korin [2 ]
Yamagishi, Junichi [2 ]
机构
[1] Univ Sci & Technol China, iFLYTEK Speech Lab, Hefei 230027, Peoples R China
[2] Univ Edinburgh, CSTR, Edinburgh EH8 9AB, Midlothian, Scotland
来源
IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING | 2013年 / 21卷 / 01期
基金
英国工程与自然科学研究理事会; 中国国家自然科学基金;
关键词
Articulatory features; Gaussian mixture model; multiple-regression hidden Markov model; speech synthesis; MOVEMENTS; ADAPTATION; EXTRACTION; TRACKING;
D O I
10.1109/TASL.2012.2215600
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
In previous work we proposed a method to control the characteristics of synthetic speech flexibly by integrating articulatory features into a hidden Markov model (HMM) based parametric speech synthesizer. In this method, a unified acoustic-articulatory model is trained, and context-dependent linear transforms are used to model the dependency between the two feature streams. In this paper, we go significantly further and propose a feature-space-switched multiple regression HMM to improve the performance of articulatory control. A multiple regression HMM (MRHMM) is adopted to model the distribution of acoustic features, with articulatory features used as exogenous "explanatory" variables. A separate Gaussian mixture model (GMM) is introduced to model the articulatory space, and articulatory-to-acoustic regression matrices are trained for each component of this GMM, instead of for the context-dependent states in the HMM. Furthermore, we propose a task-specific context feature tailoring method to ensure compatibility between state context features and articulatory features that are manipulated at synthesis time. The proposed method is evaluated on two tasks, using a speech database with acoustic waveforms and articulatory movements recorded in parallel by electromagnetic articulography (EMA). In a vowel identity modification task, the new method achieves better performance when reconstructing target vowels by varying articulatory inputs than our previous approach. A second vowel creation task shows our new method is highly effective at producing a new vowel from appropriate articulatory representations which, even though no acoustic samples for this vowel are present in the training data, is shown to sound highly natural.
引用
收藏
页码:205 / 217
页数:13
相关论文
共 50 条
  • [1] Vowel Creation by Articulatory Control in HMM-based Parametric Speech Synthesis
    Ling, Zhen-Hua
    Richmond, Korin
    Yamagishi, Junichi
    13TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2012 (INTERSPEECH 2012), VOLS 1-3, 2012, : 990 - 993
  • [2] Articulatory Control of HMM-based Parametric Speech Synthesis Driven by Phonetic Knowledge
    Ling, Zhen-Hua
    Richmond, Korin
    Yamagishi, Junichi
    Wang, Ren-Hua
    INTERSPEECH 2008: 9TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2008, VOLS 1-5, 2008, : 573 - +
  • [3] Integrating Articulatory Features Into HMM-Based Parametric Speech Synthesis
    Ling, Zhen-Hua
    Richmond, Korin
    Yamagishi, Junichi
    Wang, Ren-Hua
    IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2009, 17 (06): : 1171 - 1185
  • [4] Feature-Space Transform Tying in Unified Acoustic-Articulatory Modelling for Articulatory Control of HMM-based Speech Synthesis
    Ling, Zhen-Hua
    Richmond, Korin
    Yamagishi, Junichi
    12TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2011 (INTERSPEECH 2011), VOLS 1-5, 2011, : 124 - +
  • [5] A HMM Based Speech Synthesis Method Using Articulatory Feature
    Li, Yong
    Yin, Qing
    PROCEEDINGS OF 2019 IEEE 3RD INFORMATION TECHNOLOGY, NETWORKING, ELECTRONIC AND AUTOMATION CONTROL CONFERENCE (ITNEC 2019), 2019, : 185 - 189
  • [6] Target-Filtering Model based Articulatory Movement Prediction for Articulatory Control of HMM-based Speech Synthesis
    Cai, Ming-Qi
    Ling, Zhen-Hua
    Dai, Li-Rong
    PROCEEDINGS OF 2012 IEEE 11TH INTERNATIONAL CONFERENCE ON SIGNAL PROCESSING (ICSP) VOLS 1-3, 2012, : 605 - 608
  • [7] An HMM-based speech recognizer using overlapping articulatory features
    Erler, K
    Freeman, GH
    JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA, 1996, 100 (04): : 2500 - 2513
  • [8] HMM-based speech recognizer using overlapping articulatory features
    Erler, Kevin
    Freeman, George H.
    Journal of the Acoustical Society of America, 1996, 100 (4 pt 1):
  • [9] Improved Training of Excitation for HMM-based Parametric Speech Synthesis
    Shiga, Yoshinori
    Toda, Tomoki
    Sakai, Shinsuke
    Kawai, Hisashi
    11TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2010 (INTERSPEECH 2010), VOLS 1-2, 2010, : 809 - 812
  • [10] Speaker Adaptation using Nonlinear Regression Techniques for HMM-based Speech Synthesis
    Hong, Doo Hwa
    Kang, Shin Jae
    Lee, Joun Yeop
    Kim, Nam Soo
    2014 TENTH INTERNATIONAL CONFERENCE ON INTELLIGENT INFORMATION HIDING AND MULTIMEDIA SIGNAL PROCESSING (IIH-MSP 2014), 2014, : 586 - 589