Automatic prediction of intelligible speaking rate for individuals with ALS from speech acoustic and articulatory samples

被引:65
|
作者
Wang, Jun [1 ,2 ]
Kothalkar, Prasanna V. [1 ]
Kim, Myungjong [1 ]
Bandini, Andrea [3 ]
Cao, Beiming [1 ]
Yunusova, Yana [3 ]
Campbell, Thomas F. [2 ]
Heitzman, Daragh [4 ]
Green, Jordan R. [5 ]
机构
[1] Dept Bioengn Speech Disorders & Technol Lab, BSB 13-302,800 W Campbell Rd, Richardson, TX 75080 USA
[2] Univ Texas Dallas, Callier Ctr Commun Disorders, Richardson, TX 75083 USA
[3] Univ Toronto, Dept Speech Language Pathol, Toronto, ON, Canada
[4] MDA ALS Ctr, Houston, TX USA
[5] MGH Inst Hlth Profess, Dept Commun Sci & Disorders, Boston, MA USA
基金
美国国家卫生研究院;
关键词
amyotrophic lateral sclerosis; dysarthria; speech kinematics; intelligible speaking rate; machine learning; support vector machine; AMYOTROPHIC-LATERAL-SCLEROSIS; PARKINSONS-DISEASE; TUTORIAL; BULBAR; TONGUE;
D O I
10.1080/17549507.2018.1508499
中图分类号
R36 [病理学]; R76 [耳鼻咽喉科学];
学科分类号
100104 ; 100213 ;
摘要
Purpose: This research aimed to automatically predict intelligible speaking rate for individuals with Amyotrophic Lateral Sclerosis (ALS) based on speech acoustic and articulatory samples. Method: Twelve participants with ALS and two normal subjects produced a total of 1831 phrases. NDI Wave system was used to collect tongue and lip movement and acoustic data synchronously. A machine learning algorithm (i.e. support vector machine) was used to predict intelligible speaking rate (speech intelligibility x speaking rate) from acoustic and articulatory features of the recorded samples. Result: Acoustic, lip movement, and tongue movement information separately, yielded a R-2 of 0.652, 0.660, and 0.678 and a Root Mean Squared Error (RMSE) of 41.096, 41.166, and 39.855 words per minute (WPM) between the predicted and actual values, respectively. Combining acoustic, lip and tongue information we obtained the highest R-2 (0.712) and the lowest RMSE (37.562 WPM). Conclusion: The results revealed that our proposed analyses predicted the intelligible speaking rate of the participant with reasonably high accuracy by extracting the acoustic and/or articulatory features from one short speech sample. With further development, the analyses may be well-suited for clinical applications that require automatic speech severity prediction.
引用
收藏
页码:669 / 679
页数:11
相关论文
共 50 条
  • [31] Using automatic acoustic analysis to reveal disruptions to speech articulation in individuals at risk for psychosis
    Hitczenko, Kasia
    Segal, Yael
    Keshet, Joseph
    Mittal, Vijay
    Goldrick, Matthew
    JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA, 2023, 153 (03):
  • [32] Acoustic analyses of trained singers perceptually identified from speaking samples
    Rothman, HB
    Brown, WS
    Sapienza, CM
    Morris, RJ
    JOURNAL OF VOICE, 2001, 15 (01) : 25 - 35
  • [33] Intra-Sentential Speaking Rate Control in Neural Text-To-Speech for Automatic Dubbing
    Sharma, Mayank
    Virkar, Yogesh
    Federico, Marcello
    Barra-Chicote, Roberto
    Enyedi, Robert
    INTERSPEECH 2021, 2021, : 3151 - 3155
  • [34] Automatic Head Motion Prediction from Speech Data
    Hofer, Gregor
    Shimodaira, Hiroshi
    INTERSPEECH 2007: 8TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION, VOLS 1-4, 2007, : 865 - 868
  • [35] Acoustic feature selection for automatic emotion recognition from speech
    Rong, Jia
    Li, Gang
    Chen, Yi-Ping Phoebe
    INFORMATION PROCESSING & MANAGEMENT, 2009, 45 (03) : 315 - 328
  • [36] Study of automatic prediction of emotion from handwriting samples
    Fairhurst, Michael
    Erbilek, Meryem
    Li, Cheng
    IET BIOMETRICS, 2015, 4 (02) : 90 - 97
  • [37] THE INFLUENCE OF SPEAKING RATE ON VOWEL SPACE AND SPEECH-INTELLIGIBILITY FOR INDIVIDUALS WITH AMYOTROPHIC-LATERAL-SCLEROSIS
    TURNER, GS
    TJADEN, K
    WEISMER, G
    JOURNAL OF SPEECH AND HEARING RESEARCH, 1995, 38 (05): : 1001 - 1013
  • [38] INTEGRATED AUTOMATIC EXPRESSION PREDICTION AND SPEECH SYNTHESIS FROM TEXT
    Chen, Langzhou
    Gales, Mark J. F.
    Braunschweiler, Norbert
    Akamine, Masami
    Knill, Kate
    2013 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2013, : 7977 - 7981
  • [39] Acoustic Model Merging Using Acoustic Models from Multilingual Speakers for Automatic Speech Recognition
    Tan, Tien-Ping
    Besacier, Laurent
    Lecouteux, Benjamin
    PROCEEDINGS OF THE 2014 INTERNATIONAL CONFERENCE ON ASIAN LANGUAGE PROCESSING (IALP 2014), 2014, : 42 - 45
  • [40] ANALYSIS AND PREDICTION OF HEART RATE USING SPEECH FEATURES FROM NATURAL SPEECH
    Smith, Jennifer
    Tsiartas, Andreas
    Shriberg, Elizabeth
    Kathol, Andreas
    Willoughby, Adrian
    de Zambotti, Massimiliano
    2017 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2017, : 989 - 993