Dysarthric Speech Recognition Based on Deep Metric Learning

被引：4

作者：

Takashima, Yuki ^{[1
]}

Takashima, Ryoichi ^{[2
]}

Takiguchi, Tetsuya ^{[2
]}

Ariki, Yasuo ^{[2
]}

机构：

[1] Hitachi Ltd, Res & Dev Grp, Hitachi, Ibaraki, Japan

[2] Kobe Univ, Grad Sch Syst Informat, Kobe, Hyogo, Japan

来源：

INTERSPEECH 2020 | 2020年

关键词：

assistive technology; dysarthria; metric learning; speech recognition; FEATURES; DATABASE;

D O I：

10.21437/Interspeech.2020-2267

中图分类号：

R36 [病理学]; R76 [耳鼻咽喉科学];

学科分类号：

100104 ; 100213 ;

摘要：

We present in this paper an automatic speech recognition (ASR) system for a person with an articulation disorder resulting from athetoid cerebral palsy. Because their utterances are often unstable or unclear, speech recognition systems have difficulty recognizing the speech of those with this disorder. For example, their speech styles often fluctuate greatly even when they are repeating the same sentences. For this reason, their speech tends to have great variation even within recognition classes. To alleviate this intra-class variation problem, we propose an ASR system based on deep metric learning. This system learns an embedded representation that is characterized by a small distance between input utterances of the same class, while the distance of the input utterances of different classes is large. Therefore, our method makes it easy for the ASR system to distinguish dysarthric speech. Experimental results show that our proposed approach using deep metric learning improves the word-recognition accuracy consistently. Moreover, we also evaluate the combination of our proposed method and transfer learning from unimpaired speech to alleviate the low-resource problem associated with impaired speech.

引用

页码：4796 / 4800

页数：5

共 50 条

[1] Deep Learning-Based Acoustic Feature Representations for Dysarthric Speech Recognition
Latha M.
Shivakumar M.
Manjula G.
Hemakumar M.
Kumar M.K.
[J]. SN Computer Science, 4 (3)
[2] Deep Autoencoder based Speech Features for Improved Dysarthric Speech Recognition
Vachhani, Bhavik
Bhat, Chitralekha
Das, Biswajit
Kopparapu, Sunil Kumar
[J]. 18TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2017), VOLS 1-6: SITUATED INTERACTION, 2017, : 1854 - 1858
[3] Speech Vision: An End-to-End Deep Learning-Based Dysarthric Automatic Speech Recognition System
Shahamiri, Seyed Reza
[J]. IEEE TRANSACTIONS ON NEURAL SYSTEMS AND REHABILITATION ENGINEERING, 2021, 29 : 852 - 861
[4] A Speech Command Control-Based Recognition System for Dysarthric Patients Based on Deep Learning Technology
Lin, Yu-Yi
Zheng, Wei-Zhong
Chu, Wei Chung
Han, Ji-Yan
Hung, Ying-Hsiu
Ho, Guan-Min
Chang, Chia-Yuan
Lai, Ying-Hui
[J]. APPLIED SCIENCES-BASEL, 2021, 11 (06):
[5] An approach for speech enhancement with dysarthric speech recognition using optimization based machine learning frameworks
Jolad B.
Khanai R.
[J]. International Journal of Speech Technology, 2023, 26 (02) : 287 - 305
[6] Deep neural network architectures for dysarthric speech analysis and recognition
Brahim Fares Zaidi
Sid Ahmed Selouani
Malika Boudraa
Mohammed Sidi Yakoub
[J]. Neural Computing and Applications, 2021, 33 : 9089 - 9108
[7] Deep neural network architectures for dysarthric speech analysis and recognition
Zaidi, Brahim Fares
Selouani, Sid Ahmed
Boudraa, Malika
Sidi Yakoub, Mohammed
[J]. NEURAL COMPUTING & APPLICATIONS, 2021, 33 (15): : 9089 - 9108
[8] Optimization of dysarthric speech recognition
Chen, FX
Kostov, A
[J]. PROCEEDINGS OF THE 19TH ANNUAL INTERNATIONAL CONFERENCE OF THE IEEE ENGINEERING IN MEDICINE AND BIOLOGY SOCIETY, VOL 19, PTS 1-6: MAGNIFICENT MILESTONES AND EMERGING OPPORTUNITIES IN MEDICAL ENGINEERING, 1997, 19 : 1436 - 1439
[9] A SEQUENTIAL CONTRASTIVE LEARNING FRAMEWORK FOR ROBUST DYSARTHRIC SPEECH RECOGNITION
Wu, Lidan
Zong, Daoming
Sun, Shiliang
Zhao, Jing
[J]. 2021 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP 2021), 2021, : 7303 - 7307
[10] Transfer Learning Using Whisper for Dysarthric Automatic Speech Recognition
Rathod, Siddharth
Charola, Monil
Patil, Hemant A.
[J]. SPEECH AND COMPUTER, SPECOM 2023, PT I, 2023, 14338 : 579 - 589

← 1 2 3 4 5 →