Automatic Speech Recognition with Deep Neural Networks for Impaired Speech

被引:38
|
作者
Espana-Bonet, Cristina [1 ,2 ]
Fonollosa, Jose A. R. [1 ]
机构
[1] Univ Politecn Cataluna, TALP Res Ctr, Barcelona, Spain
[2] Univ Saarland, Saarbrucken, Germany
关键词
Speech recognition; Speaker adaptation; Deep learning; Neural networks; Dysarthria; Kaldi;
D O I
10.1007/978-3-319-49169-1_10
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Automatic Speech Recognition has reached almost human performance in some controlled scenarios. However, recognition of impaired speech is a difficult task for two main reasons: data is (i) scarce and (ii) heterogeneous. In this work we train different architectures on a database of dysarthric speech. A comparison between architectures shows that, even with a small database, hybrid DNN-HMM models outperform classical GMM-HMM according to word error rate measures. A DNN is able to improve the recognition word error rate a 13% for subjects with dysarthria with respect to the best classical architecture. This improvement is higher than the one given by other deep neural networks such as CNNs, TDNNs and LSTMs. All the experiments have been done with the Kaldi toolkit for speech recognition for which we have adapted several recipes to deal with dysarthric speech and work on the TORGO database. These recipes are publicly available.
引用
收藏
页码:97 / 107
页数:11
相关论文
共 50 条
  • [1] Automatic Recognition of Kazakh Speech Using Deep Neural Networks
    Mamyrbayev, Orken
    Turdalyuly, Mussa
    Mekebayev, Nurbapa
    Alimhan, Keylan
    Kydyrbekova, Aizat
    Turdalykyzy, Tolganay
    [J]. INTELLIGENT INFORMATION AND DATABASE SYSTEMS, ACIIDS 2019, PT II, 2019, 11432 : 465 - 474
  • [2] AUTOMATIC SPEECH RECOGNITION OF IMPAIRED SPEECH
    CARLSON, GS
    BERNSTEIN, J
    [J]. INTERNATIONAL JOURNAL OF REHABILITATION RESEARCH, 1988, 11 (04) : 396 - 398
  • [3] Deep Spiking Neural Networks for Large Vocabulary Automatic Speech Recognition
    Wu, Jibin
    Yilmaz, Emre
    Zhang, Malu
    Li, Haizhou
    Tan, Kay Chen
    [J]. FRONTIERS IN NEUROSCIENCE, 2020, 14
  • [4] Multichannel Signal Processing With Deep Neural Networks for Automatic Speech Recognition
    Sainath, Tara N.
    Weiss, Ron J.
    Wilson, Kevin W.
    Li, Bo
    Narayanan, Arun
    Variani, Ehsan
    Bacchiani, Michiel
    Shafran, Izhak
    Senior, Andrew
    Chin, Kean
    Misra, Ananya
    Kim, Chanwoo
    [J]. IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2017, 25 (05) : 965 - 979
  • [5] Robust Speech Recognition with Speech Enhanced Deep Neural Networks
    Du, Jun
    Wang, Qing
    Gao, Tian
    Xu, Yong
    Dai, Lirong
    Lee, Chin-Hui
    [J]. 15TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2014), VOLS 1-4, 2014, : 616 - 620
  • [6] Automatic Speech Recognition Based on Neural Networks
    Schlueter, Ralf
    Doetsch, Patrick
    Golik, Pavel
    Kitza, Markus
    Menne, Tobias
    Irie, Kazuki
    Tueske, Zoltan
    Zeyer, Albert
    [J]. SPEECH AND COMPUTER, 2016, 9811 : 3 - 17
  • [7] DEEP MAXOUT NEURAL NETWORKS FOR SPEECH RECOGNITION
    Cai, Meng
    Shi, Yongzhe
    Liu, Jia
    [J]. 2013 IEEE WORKSHOP ON AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING (ASRU), 2013, : 291 - 296
  • [8] SPEECH RECOGNITION WITH DEEP RECURRENT NEURAL NETWORKS
    Graves, Alex
    Mohamed, Abdel-rahman
    Hinton, Geoffrey
    [J]. 2013 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2013, : 6645 - 6649
  • [9] Deep Neural Networks in Russian Speech Recognition
    Markovnikov, Nikita
    Kipyatkova, Irina
    Karpov, Alexey
    Filchenkov, Andrey
    [J]. ARTIFICIAL INTELLIGENCE AND NATURAL LANGUAGE, 2018, 789 : 54 - 67
  • [10] Deep Segmental Neural Networks for Speech Recognition
    Abdel-Hamid, Ossama
    Deng, Li
    Yu, Dong
    Jiang, Hui
    [J]. 14TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2013), VOLS 1-5, 2013, : 1848 - 1852