Multi-Stage DNN Training for Automatic Recognition of Dysarthric Speech

被引：11

作者：

Yilmaz, Emre ^{[1
]}

Ganzeboom, Mario ^{[1
]}

Cucchiarini, Catia ^{[1
]}

Strik, Helmer ^{[1
]}

机构：

[1] Radboud Univ Nijmegen, CLS CLST, Nijmegen, Netherlands

来源：

18TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2017), VOLS 1-6: SITUATED INTERACTION | 2017年

关键词：

Pathological speech; automatic speech recognition; deep neural networks; dysarthria; INTELLIGIBILITY; THERAPY; INTENSITY; STROKE; IMPACT;

D O I：

10.21437/Interspeech.2017-303

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Incorporating automatic speech recognition (ASR) in individualized speech training applications is becoming more viable thanks to the improved generalization capabilities of neural network-based acoustic models. The main problem in developing applications for dysarthric speech is the relative in-domain data scarcity. Collecting representative amounts of dysarthric speech data is difficult due to rigorous ethical and medical permission requirements, problems in accessing patients who are generally vulnerable and often subject to altering health conditions and, last but not least, the high variability in speech resulting from different pathological conditions. Developing such applications is even more challenging for languages which in general have fewer resources, fewer speakers and, consequently, also fewer patients than English, as in the case of a mid-sized language like Dutch. In this paper, we investigate a multi-stage deep neural network (DNN) training scheme aimed at obtaining better modeling of dysarthric speech by using only a small amount of in-domain training data. The results show that the system employing the proposed training scheme considerably improves the recognition of Dutch dysarthric speech compared to a baseline system with single-stage training only on a large amount of normal speech or a small amount of in-domain data.

引用

页码：2685 / 2689

页数：5

共 50 条

[1] Multi-Stage Speech Enhancement for Automatic Speech Recognition
Lee, Seungyeol
Lee, Youngwoo
Cho, Namgook
[J]. 2016 IEEE INTERNATIONAL CONFERENCE ON CONSUMER ELECTRONICS (ICCE), 2016,
[2] A Survey of Automatic Speech Recognition for Dysarthric Speech
Qian, Zhaopeng
Xiao, Kejing
[J]. ELECTRONICS, 2023, 12 (20)
[3] Multi-Stage Audio-Visual Fusion for Dysarthric Speech Recognition With Pre-Trained Models
Yu, Chongchong
Su, Xiaosu
Qian, Zhaopeng
[J]. IEEE TRANSACTIONS ON NEURAL SYSTEMS AND REHABILITATION ENGINEERING, 2023, 31 : 1912 - 1921
[4] Automatic recognition of Arabic dysarthric speech
Tolba, Hesham M.
El-Torgoman, Ahmed S.
[J]. AEJ - Alexandria Engineering Journal, 2010, 49 (02): : 131 - 138
[5] Evaluation of an Automatic Speech Recognition Platform for Dysarthric Speech
Calvo, Irene
Tropea, Peppino
Vigano, Mauro
Scialla, Maria
Cavalcante, Agnieszka B.
Grajzer, Monika
Gilardone, Marco
Corbo, Massimo
[J]. FOLIA PHONIATRICA ET LOGOPAEDICA, 2021, 73 (05) : 432 - 441
[6] A survey of technologies for automatic Dysarthric speech recognition
Qian, Zhaopeng
Xiao, Kejing
Yu, Chongchong
[J]. EURASIP JOURNAL ON AUDIO SPEECH AND MUSIC PROCESSING, 2023, 2023 (01)
[7] A survey of technologies for automatic Dysarthric speech recognition
Zhaopeng Qian
Kejing Xiao
Chongchong Yu
[J]. EURASIP Journal on Audio, Speech, and Music Processing, 2023
[8] A Discriminative Training Method Incorporating Pronunciation Variations for Dysarthric Automatic Speech Recognition
Seong, Woo Kyeong
Kim, Nam Kyun
Ha, Hun Kyu
Kim, Hong Kook
[J]. 2016 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA), 2016,
[9] Towards the Improvement of Automatic Recognition of Dysarthric Speech
Tolba, Hesham
EL Torgoman, Ahmed S.
[J]. 2009 2ND IEEE INTERNATIONAL CONFERENCE ON COMPUTER SCIENCE AND INFORMATION TECHNOLOGY, VOL 1, 2009, : 277 - +
[10] Interface of an Automatic Recognition System for Dysarthric Speech
Zaidi, Brahim-Fares
Boudraa, Malika
Selouani, Sid-Ahmed
Addou, Djamel
[J]. INTERNATIONAL JOURNAL OF ADVANCED COMPUTER SCIENCE AND APPLICATIONS, 2018, 9 (09) : 560 - 564

← 1 2 3 4 5 →