An Analysis of HMM-based prediction of articulatory movements

被引:45
|
作者
Ling, Zhen-Hua [1 ]
Richmond, Korin [2 ]
Yamagishi, Junichi [2 ]
机构
[1] Univ Sci & Technol China, iFLYTEK Speech Lab, Hefei 230027, Anhui, Peoples R China
[2] Univ Edinburgh, CSTR, Edinburgh EH8 9LW, Midlothian, Scotland
基金
英国工程与自然科学研究理事会;
关键词
Hidden Markov model; Articulatory features; Parameter generation; ACOUSTICS;
D O I
10.1016/j.specom.2010.06.006
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
This paper presents an investigation into predicting the movement of a speaker's mouth from text input using hidden Markov models (HMM). A corpus of human articulatory movements, recorded by electromagnetic articulography (EMA), is used to train HMMs. To predict articulatory movements for input text, a suitable model sequence is selected and a maximum-likelihood parameter generation (MLPG) algorithm is used to generate output articulatory trajectories. Unified acoustic-articulatory HMMs are introduced to integrate acoustic features when an acoustic signal is also provided with the input text. Several aspects of this method are analyzed in this paper, including the effectiveness of context-dependent modeling, the role of supplementary acoustic input, and the appropriateness of certain model structures for the unified acoustic-articulatory models. When text is the sole input, we find that fully context-dependent models significantly outperform monophone and quinphone models, achieving an average root mean square (RMS) error of 1.945 mm and an average correlation coefficient of 0.600. When both text and acoustic features are given as input to the system, the difference between the performance of quinphone models and fully context-dependent models is no longer significant. The best performance overall is achieved using unified acoustic-articulatory quinphone HMMs with separate clustering of acoustic and articulatory model parameters, a synchronous-state sequence, and a dependent-feature model structure, with an RMS error of 0.900 mm and a correlation coefficient of 0.855 on average. Finally, we also apply the same quinphone HMMs to the acoustic-articulatory, or inversion, mapping problem, where only acoustic input is available. An average root mean square (RMS) error of 1.076 mm and an average correlation coefficient of 0.812 are achieved. Taken together, our results demonstrate how text and acoustic inputs both contribute to the prediction of articulatory movements in the method used. (C) 2010 Elsevier B.V. All rights reserved.
引用
下载
收藏
页码:834 / 846
页数:13
相关论文
共 50 条
  • [1] HMM-based Text-to-Articulatory-Movement Prediction and Analysis of Critical Articulators
    Ling, Zhen-Hua
    Richmond, Korin
    Yamagishi, Junichi
    11TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2010 (INTERSPEECH 2010), VOLS 3 AND 4, 2010, : 2194 - +
  • [2] EVALUATION OF LINEAR REGRESSION FOR SPEAKER ADAPTATION IN HMM-BASED ARTICULATORY MOVEMENTS ESTIMATION
    Li, Hao
    Tao, Jianhua
    Wang, Yang
    2015 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING (ICASSP), 2015, : 4944 - 4948
  • [3] Target-Filtering Model based Articulatory Movement Prediction for Articulatory Control of HMM-based Speech Synthesis
    Cai, Ming-Qi
    Ling, Zhen-Hua
    Dai, Li-Rong
    PROCEEDINGS OF 2012 IEEE 11TH INTERNATIONAL CONFERENCE ON SIGNAL PROCESSING (ICSP) VOLS 1-3, 2012, : 605 - 608
  • [4] Estimation of articulatory movements from speech acoustics using an HMM-based speech production model
    Hiroya, S
    Honda, M
    IEEE TRANSACTIONS ON SPEECH AND AUDIO PROCESSING, 2004, 12 (02): : 175 - 185
  • [5] Determination of articulatory movements from speech acoustics using an HMM-based speech production model
    Hiroya, S
    Honda, M
    2002 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOLS I-IV, PROCEEDINGS, 2002, : 437 - 440
  • [6] An HMM-based speech recognizer using overlapping articulatory features
    Erler, K
    Freeman, GH
    JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA, 1996, 100 (04): : 2500 - 2513
  • [7] HMM-based speech recognizer using overlapping articulatory features
    Erler, Kevin
    Freeman, George H.
    Journal of the Acoustical Society of America, 1996, 100 (4 pt 1):
  • [8] Integrating Articulatory Features Into HMM-Based Parametric Speech Synthesis
    Ling, Zhen-Hua
    Richmond, Korin
    Yamagishi, Junichi
    Wang, Ren-Hua
    IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2009, 17 (06): : 1171 - 1185
  • [9] Vowel Creation by Articulatory Control in HMM-based Parametric Speech Synthesis
    Ling, Zhen-Hua
    Richmond, Korin
    Yamagishi, Junichi
    13TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2012 (INTERSPEECH 2012), VOLS 1-3, 2012, : 990 - 993
  • [10] Articulatory Control of HMM-based Parametric Speech Synthesis Driven by Phonetic Knowledge
    Ling, Zhen-Hua
    Richmond, Korin
    Yamagishi, Junichi
    Wang, Ren-Hua
    INTERSPEECH 2008: 9TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2008, VOLS 1-5, 2008, : 573 - +