SYNTHESIZING DYSARTHRIC SPEECH USING MULTI-SPEAKER TTS FOR DYSARTHRIC SPEECH RECOGNITION

被引:10
|
作者
Soleymanpour, Mohammad [1 ]
Johnson, Michael T. [1 ]
Soleymanpour, Rahim [2 ]
Berry, Jeffrey [3 ]
机构
[1] Univ Kentucky, Elect & Comp Engn, Lexington, KY 40506 USA
[2] Univ Connecticut, Dept Biomed Engn, Storrs, CT 06269 USA
[3] Marquette Univ, Speech Pathol & Audiol, Milwaukee, WI 53201 USA
基金
美国国家卫生研究院;
关键词
Dysarthria; speech recognition; Speech-To-Text; Synthesized speech; Data augmentation;
D O I
10.1109/ICASSP43922.2022.9746585
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
Dysarthria is a motor speech disorder often characterized by reduced speech intelligibility through slow, uncoordinated control of speech production muscles. Automatic Speech recognition (ASR) systems may help dysarthric talkers communicate more effectively. To have robust dysarthria-specific ASR, sufficient training speech is required, which is not readily available. Recent advances in Text-To-Speech (TTS) synthesis multi-speaker end-to-end systems suggest the possibility of using synthesis for data augmentation. In this paper, we aim to improve multi-speaker end-to-end TTS systems to synthesize dysarthric speech for improved training of a dysarthria-specific DNN-HMM ASR. In the synthesized speech, we add dysarthria severity level and pause insertion mechanisms to other control parameters such as pitch, energy, and duration. Results show that a DNN-HMM model trained on additional synthetic dysarthric speech achieves WER improvement of 12.2% compared to the baseline, the addition of the severity level and pause insertion controls decrease WER by 6.5%, showing the effectiveness of adding these parameters. Audio samples are available at https://mohammadelc.github.io/SpeechGroupUKY/
引用
收藏
页码:7382 / 7386
页数:5
相关论文
共 50 条
  • [1] Dysarthric Speech Transformer: A Sequence-to-Sequence Dysarthric Speech Recognition System
    Shahamiri, Seyed Reza
    Lal, Vanshika
    Shah, Dhvani
    [J]. IEEE TRANSACTIONS ON NEURAL SYSTEMS AND REHABILITATION ENGINEERING, 2023, 31 : 3407 - 3416
  • [2] The Effectiveness of Time Stretching for Enhancing Dysarthric Speech for Improved Dysarthric Speech Recognition
    Prananta, Luke
    Halpern, Bence Mark
    Feng, Siyuan
    Scharenborg, Odette
    [J]. INTERSPEECH 2022, 2022, : 36 - 40
  • [3] Optimization of dysarthric speech recognition
    Chen, FX
    Kostov, A
    [J]. PROCEEDINGS OF THE 19TH ANNUAL INTERNATIONAL CONFERENCE OF THE IEEE ENGINEERING IN MEDICINE AND BIOLOGY SOCIETY, VOL 19, PTS 1-6: MAGNIFICENT MILESTONES AND EMERGING OPPORTUNITIES IN MEDICAL ENGINEERING, 1997, 19 : 1436 - 1439
  • [4] A Survey of Automatic Speech Recognition for Dysarthric Speech
    Qian, Zhaopeng
    Xiao, Kejing
    [J]. ELECTRONICS, 2023, 12 (20)
  • [5] Using speech rhythm knowledge to improve dysarthric speech recognition
    Selouani, S. -A.
    Dahmani, H.
    Amami, R.
    Hamam, H.
    [J]. INTERNATIONAL JOURNAL OF SPEECH TECHNOLOGY, 2012, 15 (01) : 57 - 64
  • [6] Using articulatory likelihoods in the recognition of dysarthric speech
    Rudzicz, Frank
    [J]. SPEECH COMMUNICATION, 2012, 54 (03) : 430 - 444
  • [7] Data Augmentation using Healthy Speech for Dysarthric Speech Recognition
    Vachhani, Bhavik
    Bhat, Chitralekha
    Kopparapu, Sunil Kumar
    [J]. 19TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2018), VOLS 1-6: SPEECH RESEARCH FOR EMERGING MARKETS IN MULTILINGUAL SOCIETIES, 2018, : 471 - 475
  • [8] Using speech rhythm knowledge to improve dysarthric speech recognition
    S.-A. Selouani
    H. Dahmani
    R. Amami
    H. Hamam
    [J]. International Journal of Speech Technology, 2012, 15 (1) : 57 - 64
  • [9] PHONETIC ANALYSIS OF DYSARTHRIC SPEECH TEMPO AND APPLICATIONS TO ROBUST PERSONALISED DYSARTHRIC SPEECH RECOGNITION
    Xiong, Feifei
    Barker, Jon
    Christensen, Heidi
    [J]. 2019 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2019, : 5836 - 5840
  • [10] Recognition of Dysarthric Speech Using Voice Parameters for Speaker Adaptation and Multi-taper Spectral Estimation
    Bhat, Chitralekha
    Vachhani, Bhavik
    Kopparapu, Sunil
    [J]. 17TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2016), VOLS 1-5: UNDERSTANDING SPEECH PROCESSING IN HUMANS AND MACHINES, 2016, : 228 - 232