Automatic prediction of intelligible speaking rate for individuals with ALS from speech acoustic and articulatory samples

被引:65
|
作者
Wang, Jun [1 ,2 ]
Kothalkar, Prasanna V. [1 ]
Kim, Myungjong [1 ]
Bandini, Andrea [3 ]
Cao, Beiming [1 ]
Yunusova, Yana [3 ]
Campbell, Thomas F. [2 ]
Heitzman, Daragh [4 ]
Green, Jordan R. [5 ]
机构
[1] Dept Bioengn Speech Disorders & Technol Lab, BSB 13-302,800 W Campbell Rd, Richardson, TX 75080 USA
[2] Univ Texas Dallas, Callier Ctr Commun Disorders, Richardson, TX 75083 USA
[3] Univ Toronto, Dept Speech Language Pathol, Toronto, ON, Canada
[4] MDA ALS Ctr, Houston, TX USA
[5] MGH Inst Hlth Profess, Dept Commun Sci & Disorders, Boston, MA USA
基金
美国国家卫生研究院;
关键词
amyotrophic lateral sclerosis; dysarthria; speech kinematics; intelligible speaking rate; machine learning; support vector machine; AMYOTROPHIC-LATERAL-SCLEROSIS; PARKINSONS-DISEASE; TUTORIAL; BULBAR; TONGUE;
D O I
10.1080/17549507.2018.1508499
中图分类号
R36 [病理学]; R76 [耳鼻咽喉科学];
学科分类号
100104 ; 100213 ;
摘要
Purpose: This research aimed to automatically predict intelligible speaking rate for individuals with Amyotrophic Lateral Sclerosis (ALS) based on speech acoustic and articulatory samples. Method: Twelve participants with ALS and two normal subjects produced a total of 1831 phrases. NDI Wave system was used to collect tongue and lip movement and acoustic data synchronously. A machine learning algorithm (i.e. support vector machine) was used to predict intelligible speaking rate (speech intelligibility x speaking rate) from acoustic and articulatory features of the recorded samples. Result: Acoustic, lip movement, and tongue movement information separately, yielded a R-2 of 0.652, 0.660, and 0.678 and a Root Mean Squared Error (RMSE) of 41.096, 41.166, and 39.855 words per minute (WPM) between the predicted and actual values, respectively. Combining acoustic, lip and tongue information we obtained the highest R-2 (0.712) and the lowest RMSE (37.562 WPM). Conclusion: The results revealed that our proposed analyses predicted the intelligible speaking rate of the participant with reasonably high accuracy by extracting the acoustic and/or articulatory features from one short speech sample. With further development, the analyses may be well-suited for clinical applications that require automatic speech severity prediction.
引用
收藏
页码:669 / 679
页数:11
相关论文
共 50 条
  • [41] AUTOMATIC ALIGNMENT OF A PHONETIC TRANSCRIPTION WITH ARTICULATORY EVENTS FROM X-RAY DATA OF CONTINUOUS SPEECH UTTERANCES
    NELSON, WL
    JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA, 1979, 65 : S22 - S22
  • [42] Studying the role of pitch-adaptive spectral estimation and speaking-rate normalization in automatic speech recognition
    Shahnawazuddin, S.
    Adiga, Nagaraj
    Kathania, Hemant K.
    Pradhan, Gaydhar
    Sinha, Rohit
    DIGITAL SIGNAL PROCESSING, 2018, 79 : 142 - 151
  • [43] Self-supervised speech unit discovery from articulatory and acoustic features using VQ-VAE
    Georges, Marc-Antoine
    Schwartz, Jean-Luc
    Hueber, Thomas
    INTERSPEECH 2022, 2022, : 774 - 778
  • [44] An investigation into the correlation and prediction of acoustic speech features from MFCC vectors
    Darch, Jonathan
    Milner, Ben
    Almajai, Ibrahim
    Vaseghi, Saeed
    2007 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOL IV, PTS 1-3, 2007, : 465 - +
  • [45] Pseudo-articulatory speech synthesis for recognition using automatic feature extraction from X-ray data
    Blackburn, CS
    Young, SJ
    ICSLP 96 - FOURTH INTERNATIONAL CONFERENCE ON SPOKEN LANGUAGE PROCESSING, PROCEEDINGS, VOLS 1-4, 1996, : 969 - 972
  • [46] Brief Report: Quantifying Speech Production Coordination from Non- and Minimally-Speaking Individuals
    Talkar, Tanya
    Johnson, Kristina T.
    Narain, Jaya
    Maes, Pattie
    Picard, Rosalind
    Quatieri, Thomas F.
    JOURNAL OF AUTISM AND DEVELOPMENTAL DISORDERS, 2024,
  • [47] Word Error Rate Improvement and Complexity Reduction in Automatic Speech Recognition by Analyzing Acoustic Model Uncertainty and Confusion
    Buzo, Andi
    Cucu, Horia
    Burileanu, Corneliu
    Pasca, Miruna
    Popescu, Vladimir
    2011 6TH CONFERENCE ON SPEECH TECHNOLOGY AND HUMAN-COMPUTER DIALOGUE (SPED), 2011,
  • [48] Speaking-Rate Adaptation of Automatic Speech Recognition System through Fuzzy Classification based Time-Scale Modification
    Shahnawazuddin, S.
    Kathania, Hemant K.
    Adiga, Nagaraj
    Sai, B. Tarun
    Ahmad, Waquar
    2019 25TH NATIONAL CONFERENCE ON COMMUNICATIONS (NCC), 2019,
  • [49] Automatic detection of expressed emotion from Five-Minute Speech Samples: Challenges and opportunities
    Mirheidari, Bahman
    Bittar, Andre
    Cummins, Nicholas
    Downs, Johnny
    Fisher, Helen L.
    Christensen, Heidi
    PLOS ONE, 2024, 19 (03):
  • [50] Automatic Detection of Expressed Emotion from Five-Minute Speech Samples: Challenges and Opportunities
    Mirheidari, Bahman
    Bittar, Andre
    Cummins, Nicholas
    Downs, Johnny
    Fisher, Helen L.
    Christensen, Heidi
    INTERSPEECH 2022, 2022, : 2458 - 2462