Dysarthric Speech Recognition Based on Deep Metric Learning

被引:4
|
作者
Takashima, Yuki [1 ]
Takashima, Ryoichi [2 ]
Takiguchi, Tetsuya [2 ]
Ariki, Yasuo [2 ]
机构
[1] Hitachi Ltd, Res & Dev Grp, Hitachi, Ibaraki, Japan
[2] Kobe Univ, Grad Sch Syst Informat, Kobe, Hyogo, Japan
来源
关键词
assistive technology; dysarthria; metric learning; speech recognition; FEATURES; DATABASE;
D O I
10.21437/Interspeech.2020-2267
中图分类号
R36 [病理学]; R76 [耳鼻咽喉科学];
学科分类号
100104 ; 100213 ;
摘要
We present in this paper an automatic speech recognition (ASR) system for a person with an articulation disorder resulting from athetoid cerebral palsy. Because their utterances are often unstable or unclear, speech recognition systems have difficulty recognizing the speech of those with this disorder. For example, their speech styles often fluctuate greatly even when they are repeating the same sentences. For this reason, their speech tends to have great variation even within recognition classes. To alleviate this intra-class variation problem, we propose an ASR system based on deep metric learning. This system learns an embedded representation that is characterized by a small distance between input utterances of the same class, while the distance of the input utterances of different classes is large. Therefore, our method makes it easy for the ASR system to distinguish dysarthric speech. Experimental results show that our proposed approach using deep metric learning improves the word-recognition accuracy consistently. Moreover, we also evaluate the combination of our proposed method and transfer learning from unimpaired speech to alleviate the low-resource problem associated with impaired speech.
引用
下载
收藏
页码:4796 / 4800
页数:5
相关论文
共 50 条
  • [31] Improving Recognition of Dysarthric Speech Using Severity Based Tempo Adaptation
    Bhat, Chitralekha
    Vachhani, Bhavik
    Kopparapu, Sunil
    Speech and Computer, 2016, 9811 : 370 - 377
  • [32] Optimizing Vocabulary Modeling for Dysarthric Speech Recognition
    Na, Minsoo
    Chung, Minhwa
    COMPUTERS HELPING PEOPLE WITH SPECIAL NEEDS, PT II (ICCHP 2016), 2016, 9759 : 507 - 510
  • [33] A survey of technologies for automatic Dysarthric speech recognition
    Qian, Zhaopeng
    Xiao, Kejing
    Yu, Chongchong
    EURASIP JOURNAL ON AUDIO SPEECH AND MUSIC PROCESSING, 2023, 2023 (01)
  • [34] Using articulatory likelihoods in the recognition of dysarthric speech
    Rudzicz, Frank
    SPEECH COMMUNICATION, 2012, 54 (03) : 430 - 444
  • [35] Using speech rhythm knowledge to improve dysarthric speech recognition
    Selouani, S. -A.
    Dahmani, H.
    Amami, R.
    Hamam, H.
    INTERNATIONAL JOURNAL OF SPEECH TECHNOLOGY, 2012, 15 (01) : 57 - 64
  • [36] Residual Convolutional Neural Network-Based Dysarthric Speech Recognition
    Kumar, Raj
    Tripathy, Manoj
    Anand, R. S.
    Kumar, Niraj
    ARABIAN JOURNAL FOR SCIENCE AND ENGINEERING, 2024, 49 (12) : 16241 - 16251
  • [37] A survey of technologies for automatic Dysarthric speech recognition
    Zhaopeng Qian
    Kejing Xiao
    Chongchong Yu
    EURASIP Journal on Audio, Speech, and Music Processing, 2023
  • [38] PHASE-BASED FEATURE REPRESENTATIONS FOR IMPROVING RECOGNITION OF DYSARTHRIC SPEECH
    Sehgal, Siddharth
    Cunningham, Stuart
    Green, Phil
    2018 IEEE WORKSHOP ON SPOKEN LANGUAGE TECHNOLOGY (SLT 2018), 2018, : 13 - 20
  • [39] Subspace-Based Learning for Automatic Dysarthric Speech Detection
    Janbakhshi, Parvaneh
    Kodrasi, Ina
    Bourlard, Herve
    IEEE SIGNAL PROCESSING LETTERS, 2021, 28 (28) : 96 - 100
  • [40] Data Augmentation using Healthy Speech for Dysarthric Speech Recognition
    Vachhani, Bhavik
    Bhat, Chitralekha
    Kopparapu, Sunil Kumar
    19TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2018), VOLS 1-6: SPEECH RESEARCH FOR EMERGING MARKETS IN MULTILINGUAL SOCIETIES, 2018, : 471 - 475