Regularized Speaker Adaptation of KL-HMM for Dysarthric Speech Recognition

被引:31
|
作者
Kim, Myungjong [1 ]
Kim, Younggwan [2 ]
Yoo, Joohong [2 ]
Wang, Jun [1 ]
Kim, Hoirin [2 ]
机构
[1] Univ Texas Dallas, Dept Bioengn, Richardson, TX 75080 USA
[2] Korea Adv Inst Sci & Technol, Sch Elect Engn, Daejeon 305701, South Korea
基金
新加坡国家研究基金会; 美国国家卫生研究院;
关键词
Dysarthria; speech recognition; speaker adaptation; KL-HMM; regularization; KULLBACK-LEIBLER DIVERGENCE; ACOUSTIC MODEL;
D O I
10.1109/TNSRE.2017.2681691
中图分类号
R318 [生物医学工程];
学科分类号
0831 ;
摘要
This paper addresses the problem of recognizing the speech uttered by patients with dysarthria, which is a motor speech disorder impeding the physical production of speech. Patients with dysarthria have articulatory limitation, and therefore, they often have trouble in pronouncing certain sounds, resulting in undesirable phonetic variation. Modern automatic speech recognition systems designed for regular speakers are ineffective for dysarthric sufferers due to the phonetic variation. To capture the phonetic variation, Kullback-Leibler divergence-based hidden Markov model (KL-HMM) is adopted, where the emission probability of state is parameterized by a categorical distribution using phoneme posterior probabilities obtained from a deep neural network-based acoustic model. To further reflect speaker-specific phonetic variation patterns, a speaker adaptation method based on a combination of L2 regularization and confusion-reducing regularization, which can enhance discriminability between categorical distributions of the KL-HMM states while preserving speaker-specific information is proposed. Evaluation of the proposed speaker adaptation method on a database of several hundred words for 30 speakers consisting of 12 mildly dysarthric, 8 moderately dysarthric, and 10 non-dysarthric control speakers showed that the proposed approach significantly outperformed the conventional deep neural network-based speaker adapted system on dysarthric as well as non-dysarthric speech.
引用
下载
收藏
页码:1581 / 1591
页数:11
相关论文
共 50 条
  • [1] Grapheme-Based Automatic Speech Recognition Using KL-HMM
    Magimai-Doss, Mathew
    Rasipuram, Ramya
    Aradilla, Guillermo
    Bourlard, Herve
    12TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2011 (INTERSPEECH 2011), VOLS 1-5, 2011, : 452 - 455
  • [2] KL-HMM BASED SPEAKER DIARIZATION SYSTEM FOR MEETINGS
    Madikeri, Srikanth
    Bourlard, Herve
    2015 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING (ICASSP), 2015, : 4435 - 4439
  • [3] COMBINING SGMM SPEAKER VECTORS AND KL-HMM APPROACH FOR SPEAKER DIARIZATION
    Madikeri, Srikanth
    Motlicek, Petr
    Bourlard, Herve
    2015 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING (ICASSP), 2015, : 4834 - 4838
  • [4] SYNTHESIZING DYSARTHRIC SPEECH USING MULTI-SPEAKER TTS FOR DYSARTHRIC SPEECH RECOGNITION
    Soleymanpour, Mohammad
    Johnson, Michael T.
    Soleymanpour, Rahim
    Berry, Jeffrey
    2022 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2022, : 7382 - 7386
  • [5] SPEAKER IDENTITY PRESERVATION IN DYSARTHRIC SPEECH RECONSTRUCTION BY ADVERSARIAL SPEAKER ADAPTATION
    Wang, Disong
    Liu, Songxiang
    Wu, Xixin
    Lu, Hui
    Sun, Lifa
    Liu, Xunying
    Meng, Helen
    2022 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2022, : 6677 - 6681
  • [6] An integrated study of speaker normalisation and HMM adaptation for noise robust speaker-independent speech recognition
    Hariharan, R
    Viikki, O
    SPEECH COMMUNICATION, 2002, 37 (3-4) : 349 - 361
  • [7] Speaker Adaptation Using Spectro-Temporal Deep Features for Dysarthric and Elderly Speech Recognition
    Geng, Mengzhe
    Xie, Xurong
    Ye, Zi
    Wang, Tianzi
    Li, Guinan
    Hu, Shujie
    Liu, Xunying
    Meng, Helen
    IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2022, 30 : 2597 - 2611
  • [8] FAST SPEAKER ADAPTATION OF HYBRID NN/HMM MODEL FOR SPEECH RECOGNITION BASED ON DISCRIMINATIVE LEARNING OF SPEAKER CODE
    Abdel-Hamid, Ossama
    Jiang, Hui
    2013 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2013, : 7942 - 7946
  • [9] Recognition of Dysarthric Speech Using Voice Parameters for Speaker Adaptation and Multi-taper Spectral Estimation
    Bhat, Chitralekha
    Vachhani, Bhavik
    Kopparapu, Sunil
    17TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2016), VOLS 1-5: UNDERSTANDING SPEECH PROCESSING IN HUMANS AND MACHINES, 2016, : 228 - 232
  • [10] Speaker Independent Urdu Speech Recognition Using HMM
    Ashraf, Javed
    Iqbal, Naveed
    Khattak, Naveed Sarfraz
    Zaidi, Ather Mohsin
    NATURAL LANGUAGE PROCESSING AND INFORMATION SYSTEMS, 2010, 6177 : 140 - 148