Interpretable Deep Learning Model for the Detection and Reconstruction of Dysarthric Speech

被引:12
|
作者
Korzekwa, Daniel [1 ]
Barra-Chicote, Roberto [1 ]
Kostek, Bozena [2 ]
Drugman, Thomas [1 ]
Lajszczak, Mateusz [1 ]
机构
[1] Amazon TTS Res, Cambridge, England
[2] Gdansk Univ Technol, Fac ETI, Gdansk, Poland
来源
关键词
dysarthria detection; speech recognition; speech synthesis; interpretable deep learning models;
D O I
10.21437/Interspeech.2019-1206
中图分类号
R36 [病理学]; R76 [耳鼻咽喉科学];
学科分类号
100104 ; 100213 ;
摘要
We present a novel deep learning model for the detection and reconstruction of dysarthric speech. We train the model with a multi-task learning technique to jointly solve dysarthria detection and speech reconstruction tasks. The model key feature is a low-dimensional latent space that is meant to encode the properties of dysarthric speech. It is commonly believed that neural networks are black boxes that solve problems but do not provide interpretable outputs. On the contrary, we show that this latent space successfully encodes interpretable characteristics of dysarthria, is effective at detecting dysarthria, and that manipulation of the latent space allows the model to reconstruct healthy speech from dysarthric speech. This work can help patients and speech pathologists to improve their understanding of the condition, lead to more accurate diagnoses and aid in reconstructing healthy speech for afflicted patients.
引用
收藏
页码:3890 / 3894
页数:5
相关论文
共 50 条
  • [1] Comparative analysis of deep learning models for dysarthric speech detection
    P. Shanmugapriya
    V. Mohan
    Soft Computing, 2024, 28 : 5683 - 5698
  • [2] Comparative analysis of deep learning models for dysarthric speech detection
    Shanmugapriya, P.
    Mohan, V.
    SOFT COMPUTING, 2024, 28 (06) : 5683 - 5698
  • [3] An Interpretable Deep Learning Model for Speech Activity Detection Using Electrocorticographic Signals
    Stuart, Morgan
    Lesaja, Srdjan
    Shih, Jerry J.
    Schultz, Tanja
    Manic, Milos
    Krusienski, Dean J.
    IEEE TRANSACTIONS ON NEURAL SYSTEMS AND REHABILITATION ENGINEERING, 2022, 30 : 2783 - 2792
  • [4] Enhancement of Dysarthric Speech Reconstruction by Contrastive Learning
    Keshvari, Fatemeh
    Toroghi, Rahil Mahdian
    Zareian, Hassan
    arXiv,
  • [5] Interpretable Objective Assessment of Dysarthric Speech based on Deep Neural Networks
    Tu, Ming
    Berisha, Visar
    Liss, Julie
    18TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2017), VOLS 1-6: SITUATED INTERACTION, 2017, : 1849 - 1853
  • [6] Scalogram based performance comparison of deep learning architectures for dysarthric speech detection
    Shabber, Shaik Mulla
    Sumesh, E. P.
    Ramachandran, Vidhya Lavanya
    ARTIFICIAL INTELLIGENCE REVIEW, 2025, 58 (05)
  • [7] Dysarthric Speech Recognition Based on Deep Metric Learning
    Takashima, Yuki
    Takashima, Ryoichi
    Takiguchi, Tetsuya
    Ariki, Yasuo
    INTERSPEECH 2020, 2020, : 4796 - 4800
  • [8] EXPERIMENTAL INVESTIGATION ON STFT PHASE REPRESENTATIONS FOR DEEP LEARNING-BASED DYSARTHRIC SPEECH DETECTION
    Janbakhshi, Parvaneh
    Kodrasi, Ina
    2022 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2022, : 6477 - 6481
  • [9] Subspace-Based Learning for Automatic Dysarthric Speech Detection
    Janbakhshi, Parvaneh
    Kodrasi, Ina
    Bourlard, Herve
    IEEE SIGNAL PROCESSING LETTERS, 2021, 28 (28) : 96 - 100
  • [10] Repetition Detection in Dysarthric Speech
    Diwakar, G.
    Karjigi, Veena
    2017 2ND IEEE INTERNATIONAL CONFERENCE ON WIRELESS COMMUNICATIONS, SIGNAL PROCESSING AND NETWORKING (WISPNET), 2017, : 1150 - 1154