Automatic Speech Recognition with Deep Neural Networks for Impaired Speech

被引：38

作者：

Espana-Bonet, Cristina ^{[1
,2
]}

Fonollosa, Jose A. R. ^{[1
]}

机构：

[1] Univ Politecn Cataluna, TALP Res Ctr, Barcelona, Spain

[2] Univ Saarland, Saarbrucken, Germany

来源：

ADVANCES IN SPEECH AND LANGUAGE TECHNOLOGIES FOR IBERIAN LANGUAGES, IBERSPEECH 2016 | 2016年 / 10077卷

关键词：

Speech recognition; Speaker adaptation; Deep learning; Neural networks; Dysarthria; Kaldi;

D O I：

10.1007/978-3-319-49169-1_10

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Automatic Speech Recognition has reached almost human performance in some controlled scenarios. However, recognition of impaired speech is a difficult task for two main reasons: data is (i) scarce and (ii) heterogeneous. In this work we train different architectures on a database of dysarthric speech. A comparison between architectures shows that, even with a small database, hybrid DNN-HMM models outperform classical GMM-HMM according to word error rate measures. A DNN is able to improve the recognition word error rate a 13% for subjects with dysarthria with respect to the best classical architecture. This improvement is higher than the one given by other deep neural networks such as CNNs, TDNNs and LSTMs. All the experiments have been done with the Kaldi toolkit for speech recognition for which we have adapted several recipes to deal with dysarthric speech and work on the TORGO database. These recipes are publicly available.

引用

页码：97 / 107

页数：11

共 50 条

[1] Automatic Recognition of Kazakh Speech Using Deep Neural Networks
Mamyrbayev, Orken
Turdalyuly, Mussa
Mekebayev, Nurbapa
Alimhan, Keylan
Kydyrbekova, Aizat
Turdalykyzy, Tolganay
[J]. INTELLIGENT INFORMATION AND DATABASE SYSTEMS, ACIIDS 2019, PT II, 2019, 11432 : 465 - 474
[2] AUTOMATIC SPEECH RECOGNITION OF IMPAIRED SPEECH
CARLSON, GS
BERNSTEIN, J
[J]. INTERNATIONAL JOURNAL OF REHABILITATION RESEARCH, 1988, 11 (04) : 396 - 398
[3] Deep Spiking Neural Networks for Large Vocabulary Automatic Speech Recognition
Wu, Jibin
Yilmaz, Emre
Zhang, Malu
Li, Haizhou
Tan, Kay Chen
[J]. FRONTIERS IN NEUROSCIENCE, 2020, 14
[4] Multichannel Signal Processing With Deep Neural Networks for Automatic Speech Recognition
Sainath, Tara N.
Weiss, Ron J.
Wilson, Kevin W.
Li, Bo
Narayanan, Arun
Variani, Ehsan
Bacchiani, Michiel
Shafran, Izhak
Senior, Andrew
Chin, Kean
Misra, Ananya
Kim, Chanwoo
[J]. IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2017, 25 (05) : 965 - 979
[5] Robust Speech Recognition with Speech Enhanced Deep Neural Networks
Du, Jun
Wang, Qing
Gao, Tian
Xu, Yong
Dai, Lirong
Lee, Chin-Hui
[J]. 15TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2014), VOLS 1-4, 2014, : 616 - 620
[6] Automatic Speech Recognition Based on Neural Networks
Schlueter, Ralf
Doetsch, Patrick
Golik, Pavel
Kitza, Markus
Menne, Tobias
Irie, Kazuki
Tueske, Zoltan
Zeyer, Albert
[J]. SPEECH AND COMPUTER, 2016, 9811 : 3 - 17
[7] DEEP MAXOUT NEURAL NETWORKS FOR SPEECH RECOGNITION
Cai, Meng
Shi, Yongzhe
Liu, Jia
[J]. 2013 IEEE WORKSHOP ON AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING (ASRU), 2013, : 291 - 296
[8] SPEECH RECOGNITION WITH DEEP RECURRENT NEURAL NETWORKS
Graves, Alex
Mohamed, Abdel-rahman
Hinton, Geoffrey
[J]. 2013 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2013, : 6645 - 6649
[9] Deep Neural Networks in Russian Speech Recognition
Markovnikov, Nikita
Kipyatkova, Irina
Karpov, Alexey
Filchenkov, Andrey
[J]. ARTIFICIAL INTELLIGENCE AND NATURAL LANGUAGE, 2018, 789 : 54 - 67
[10] Deep Segmental Neural Networks for Speech Recognition
Abdel-Hamid, Ossama
Deng, Li
Yu, Dong
Jiang, Hui
[J]. 14TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2013), VOLS 1-5, 2013, : 1848 - 1852

← 1 2 3 4 5 →