Using articulatory likelihoods in the recognition of dysarthric speech

被引:27
|
作者
Rudzicz, Frank [1 ]
机构
[1] Univ Toronto, Dept Comp Sci, Toronto, ON, Canada
关键词
Dysarthria; Speech recognition; Acoustic-articulatory inversion; Task-dynamics; RECOVERING ARTICULATION; MODEL;
D O I
10.1016/j.specom.2011.10.006
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
Millions of individuals have congenital or acquired neuro-motor conditions that limit control of their muscles, including those that manipulate the vocal tract. These conditions, collectively called dysarthria, result in speech that is very difficult to understand both by human listeners and by traditional automatic speech recognition (ASR), which in some cases can be rendered completely unusable. In this work we first introduce a new method for acoustic-to-articulatory inversion which estimates positions of the vocal tract given acoustics using a nonlinear Hammerstein system. This is accomplished based on the theory of task-dynamics using the TORGO database of dysarthric articulation. Our approach uses adaptive kernel canonical correlation analysis and is found to be significantly more accurate than mixture density networks, at or above the 95% level of confidence for most vocal tract variables. Next, we introduce a new method for ASR in which acoustic-based hypotheses are re-evaluated according to the likelihoods of their articulatory realizations in task-dynamics. This approach incorporates high-level, long-term aspects of speech production and is found to be significantly more accurate than hidden Markov models, dynamic Bayesian networks, and switching Kalman filters. (C) 2011 Elsevier B.V. All rights reserved.
引用
收藏
页码:430 / 444
页数:15
相关论文
共 50 条
  • [41] Dysarthric Speech Recognition Using Variational Mode Decomposition and Convolutional Neural Networks
    R. Rajeswari
    T. Devi
    S. Shalini
    [J]. Wireless Personal Communications, 2022, 122 : 293 - 307
  • [42] Accuracy of three speech recognition systems: Case study of dysarthric speech
    Hux, Karen
    Rankin-Erickson, Joan
    Manasse, Nancy
    Lauritzen, Elizabeth
    [J]. 2000, Decker Periodicals Publishing, Inc., Hamilton, Canada (16):
  • [43] Speech recognition using syllable and pseudo-articulatory features modeling
    Zhang, L
    [J]. PROCEEDINGS OF THE 2005 IEEE INTERNATIONAL CONFERENCE ON NATURAL LANGUAGE PROCESSING AND KNOWLEDGE ENGINEERING (IEEE NLP-KE'05), 2005, : 137 - 141
  • [44] DYSARTHRIC SPEECH RECOGNITION WITH LATTICE-FREE MMI
    Hermann, Enno
    Magimai-Doss, Mathew
    [J]. 2020 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2020, : 6109 - 6113
  • [45] Comparing Humans and Automatic Speech Recognition Systems in Recognizing Dysarthric Speech
    Mengistu, Kinfe Tadesse
    Rudzicz, Frank
    [J]. ADVANCES IN ARTIFICIAL INTELLIGENCE, 2011, 6657 : 291 - 300
  • [46] RAW SOURCE AND FILTER MODELLING FOR DYSARTHRIC SPEECH RECOGNITION
    Yue, Zhengjun
    Loweimi, Erfan
    Cvetkovic, Zoran
    [J]. 2022 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2022, : 7377 - 7381
  • [47] Domain Adversarial Neural Networks for Dysarthric Speech Recognition
    Woszczyk, Dominika
    Petridis, Stavros
    Millard, David
    [J]. INTERSPEECH 2020, 2020, : 3875 - 3879
  • [49] Deep Autoencoder based Speech Features for Improved Dysarthric Speech Recognition
    Vachhani, Bhavik
    Bhat, Chitralekha
    Das, Biswajit
    Kopparapu, Sunil Kumar
    [J]. 18TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2017), VOLS 1-6: SITUATED INTERACTION, 2017, : 1854 - 1858
  • [50] Continuous episodic memory based speech recognition using articulatory dynamics
    Demange, Sebastien
    Ouni, Slim
    [J]. 12TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2011 (INTERSPEECH 2011), VOLS 1-5, 2011, : 2316 - +