On Improving Acoustic Models For TORGO Dysarthric Speech Database

被引:8
|
作者
Joy, Neethu Mariam [1 ]
Umesh, S. [1 ]
Abraham, Basil [1 ]
机构
[1] Indian Inst Technol Madras, Madras, Tamil Nadu, India
关键词
Dysarthria; TORGO; GMM-HMM; DNN; RECOGNITION;
D O I
10.21437/Interspeech.2017-878
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Assistive technologies based on speech have been shown to improve the quality of life of people affected with dysarthria, a motor speech disorder. Multiple ways to improve Gaussian mixture model-hidden Markov model (GMM-HMM) and deep neural network (DNN) based automatic speech recognition (ASR) systems for TORGO database for dysarthric speech are explored in this paper. Past attempts in developing ASR systems for TORGO database were limited to training just mono phone models and doing speaker adaptation over them. Although a recent work attempted training triphone and neural network models, parameters like the number of context dependent states, dimensionality of the principal component features etc were not properly tuned. This paper develops speaker specific ASR models for each dysarthric speaker in TORGO database by tuning parameters of GMM-HMM model, number of layers and hidden nodes in DNN. Employing dropout scheme and sequence discriminative training in DNN also gave significant gains. Speaker adapted features like feature-space maximum likelihood linear regression (FMLLR) are used to pass the speaker information to DNNs. To the best of our knowledge, this paper presents the best recognition accuracies for TORGO database till date.
引用
收藏
页码:2695 / 2699
页数:5
相关论文
共 50 条
  • [1] Improving Acoustic Models in TORGO Dysarthric Speech Database
    Joy, Neethu Mariam
    Umesh, S.
    IEEE TRANSACTIONS ON NEURAL SYSTEMS AND REHABILITATION ENGINEERING, 2018, 26 (03) : 637 - 645
  • [2] The TORGO database of acoustic and articulatory speech from speakers with dysarthria
    Frank Rudzicz
    Aravind Kumar Namasivayam
    Talya Wolff
    Language Resources and Evaluation, 2012, 46 : 523 - 541
  • [3] The TORGO database of acoustic and articulatory speech from speakers with dysarthria
    Rudzicz, Frank
    Namasivayam, Aravind Kumar
    Wolff, Talya
    LANGUAGE RESOURCES AND EVALUATION, 2012, 46 (04) : 523 - 541
  • [4] DNN Acoustic Models for Dysarthric Speech
    Tejaswi, Seeram
    Umesh, S.
    2017 TWENTY-THIRD NATIONAL CONFERENCE ON COMMUNICATIONS (NCC), 2017,
  • [5] ADAPTING ACOUSTIC AND LEXICAL MODELS TO DYSARTHRIC SPEECH
    Mengistu, Kinfe Tadesse
    Rudzicz, Frank
    2011 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2011, : 4924 - 4927
  • [6] Improving Acoustic Models for Dysarthric Speech Recognition using Time Delay Neural Networks
    Misbullah, Alim
    Lin, Hai-Hsing
    Chang, Chia-Yuan
    Yeh, Hsiu-Wei
    Weng, Ko-Cheng
    2020 INTERNATIONAL CONFERENCE ON ELECTRICAL ENGINEERING AND INFORMATICS (ICELTICS 2020), 2020, : 118 - 121
  • [7] The Nemours database of dysarthric speech
    MenendezPidal, X
    Polikoff, JB
    Peters, SM
    Leonzio, JE
    Bunnell, HT
    ICSLP 96 - FOURTH INTERNATIONAL CONFERENCE ON SPOKEN LANGUAGE PROCESSING, PROCEEDINGS, VOLS 1-4, 1996, : 1962 - 1965
  • [8] ACOUSTIC EVALUATION OF DYSARTHRIC SPEECH
    HIROSE, H
    OHYAMA, G
    FOLIA PHONIATRICA, 1989, 41 (4-5): : 175 - 175
  • [9] ACOUSTIC DESCRIPTION OF DYSARTHRIC SPEECH
    LEHISTE, I
    TIKOFSKY, RS
    TIKOFSKY, RP
    JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA, 1961, 33 (11): : 1677 - &
  • [10] Improving the intelligibility of dysarthric speech
    Kain, Alexander B.
    Hosom, John-Paul
    Niu, Xiaochuan
    van Santen, Jan P. H.
    Fried-Oken, Melanie
    Staehely, Janice
    SPEECH COMMUNICATION, 2007, 49 (09) : 743 - 759