Comparing the performance of classic voice-driven assistive systems for dysarthric speech

被引:0
|
作者
Zheng, Wei-Zhong [1 ]
Han, Ji-Yan [1 ]
Cheng, Hsiu-Lien [1 ]
Chu, Wei-Chug [1 ]
Chen, Ko-Chiang [1 ]
Lai, Ying-Hui [1 ,2 ]
机构
[1] Natl Yang Ming Chiao Tung Univ, Dept Biomed Engn, Taipei, Taiwan
[2] Natl Yang Ming Chiao Tung Univ, Med Device Innovat & Translat Ctr, Taipei, Taiwan
关键词
Dysarthria; Voice -driven assistive; Speech intelligibility; Deep learning; INTELLIGIBILITY; FREQUENCY;
D O I
10.1016/j.bspc.2022.104447
中图分类号
R318 [生物医学工程];
学科分类号
0831 ;
摘要
Voice-driven communication assistive systems-speech enhancement (SE), voice conversion (VC), and automatic speech recognition with text-to-speech (ASR-TTS)-are recognized approaches for improving dysarthric speakers' speech intelligibility. However, which approach performs better for moderate dysarthric patients is unclear. This study compared the benefits of three classic difference-type voice-driven assistive systems for dysarthric patients under identical test conditions. The benefits of the three systems for dysarthric patients' speech intelligibility were compared; 14 mild-to-severe dysarthric patients and five speakers with normal speech were invited to record the training sets for these systems. Five moderate dysarthric patients were selected to record two additional testing sets, which were used for evaluating the systems' benefits. Google Automatic Speech Recognition's (Google ASR) evaluation metrics and listening tests verified each system's speech intelligibility and quality. The speech intelligibility results produced by Google ASR were 7.0%, 22.9%, and 93.8% for the SE, VC, and ASR-TTS systems, respectively. Regarding the listening test, the performance of speech intelligibility and quality were 38.7%, 40.5%, 95.5%, and 1.81, 2.18, 4.56 for SE, VC, and ASR-TTS systems, respectively. The ASR-TTS system performed better than SE and VC. Furthermore, t-distributed stochastic neighbor embedding (t-SNE) analysis was used to additionally compare the differences between the systems. The t-SNE analysis results indicated that ASR-TTS' phonetic posteriorgram features provided stable performance compared with the other speech features (log-power spectrum and spectra) in the SE and VC systems. Results showed that the ASR-TTS is a potential system to improve moderate dysarthric patients' speech intelligibility and quality in future applications.
引用
收藏
页数:13
相关论文
共 12 条
  • [1] Recognition of elderly speech and voice-driven document retrieval
    Anderson, S
    Liberman, N
    Bernstein, E
    Foster, S
    Cate, E
    Levin, B
    Hudson, R
    [J]. ICASSP '99: 1999 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, PROCEEDINGS VOLS I-VI, 1999, : 145 - 148
  • [2] Noise Robust Speech Recognition Applied to Voice-Driven Wheelchair
    Sasou, Akira
    Kojima, Hiroaki
    [J]. EURASIP JOURNAL ON ADVANCES IN SIGNAL PROCESSING, 2009,
  • [3] Noise Robust Speech Recognition Applied to Voice-Driven Wheelchair
    Akira Sasou
    Hiroaki Kojima
    [J]. EURASIP Journal on Advances in Signal Processing, 2009
  • [4] Voice-Driven Modeling: Software Modeling Using Automated Speech Recognition
    Black, Dana
    Rapos, Eric J.
    Stephan, Matthew
    [J]. 2019 ACM/IEEE 22ND INTERNATIONAL CONFERENCE ON MODEL DRIVEN ENGINEERING LANGUAGES AND SYSTEMS COMPANION (MODELS-C 2019), 2019, : 252 - 258
  • [6] Comparing Humans and Automatic Speech Recognition Systems in Recognizing Dysarthric Speech
    Mengistu, Kinfe Tadesse
    Rudzicz, Frank
    [J]. ADVANCES IN ARTIFICIAL INTELLIGENCE, 2011, 6657 : 291 - 300
  • [7] Scratchthat: Supporting command-agnostic speech repair in voice-driven assistants
    Wu, Jason
    Ahuja, Karan
    Li, Richard
    Chen, Victor
    Bigham, Jeffrey
    [J]. Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies, 2019, 3 (02):
  • [8] A VOICE-DRIVEN AND TOUCH-DRIVEN NATURAL-LANGUAGE EDITOR AND ITS PERFORMANCE
    BIERMANN, AW
    FINEMAN, L
    HEIDLAGE, JF
    [J]. INTERNATIONAL JOURNAL OF MAN-MACHINE STUDIES, 1992, 37 (01): : 1 - 21
  • [9] In-Vehicle Speech Recognition for Voice-Driven UAV Control in a Collaborative Environment of MAV and UAV
    Park, Jeong-Sik
    Geng, Na
    [J]. AEROSPACE, 2023, 10 (10)
  • [10] Front-End of Vehicle-Embedded Speech Recognition for Voice-Driven Multi-UAVs Control
    Park, Jeong-Sik
    Na, Hyeong-Ju
    [J]. APPLIED SCIENCES-BASEL, 2020, 10 (19): : 1 - 27