Automatic prediction of intelligible speaking rate for individuals with ALS from speech acoustic and articulatory samples

被引:65
|
作者
Wang, Jun [1 ,2 ]
Kothalkar, Prasanna V. [1 ]
Kim, Myungjong [1 ]
Bandini, Andrea [3 ]
Cao, Beiming [1 ]
Yunusova, Yana [3 ]
Campbell, Thomas F. [2 ]
Heitzman, Daragh [4 ]
Green, Jordan R. [5 ]
机构
[1] Dept Bioengn Speech Disorders & Technol Lab, BSB 13-302,800 W Campbell Rd, Richardson, TX 75080 USA
[2] Univ Texas Dallas, Callier Ctr Commun Disorders, Richardson, TX 75083 USA
[3] Univ Toronto, Dept Speech Language Pathol, Toronto, ON, Canada
[4] MDA ALS Ctr, Houston, TX USA
[5] MGH Inst Hlth Profess, Dept Commun Sci & Disorders, Boston, MA USA
基金
美国国家卫生研究院;
关键词
amyotrophic lateral sclerosis; dysarthria; speech kinematics; intelligible speaking rate; machine learning; support vector machine; AMYOTROPHIC-LATERAL-SCLEROSIS; PARKINSONS-DISEASE; TUTORIAL; BULBAR; TONGUE;
D O I
10.1080/17549507.2018.1508499
中图分类号
R36 [病理学]; R76 [耳鼻咽喉科学];
学科分类号
100104 ; 100213 ;
摘要
Purpose: This research aimed to automatically predict intelligible speaking rate for individuals with Amyotrophic Lateral Sclerosis (ALS) based on speech acoustic and articulatory samples. Method: Twelve participants with ALS and two normal subjects produced a total of 1831 phrases. NDI Wave system was used to collect tongue and lip movement and acoustic data synchronously. A machine learning algorithm (i.e. support vector machine) was used to predict intelligible speaking rate (speech intelligibility x speaking rate) from acoustic and articulatory features of the recorded samples. Result: Acoustic, lip movement, and tongue movement information separately, yielded a R-2 of 0.652, 0.660, and 0.678 and a Root Mean Squared Error (RMSE) of 41.096, 41.166, and 39.855 words per minute (WPM) between the predicted and actual values, respectively. Combining acoustic, lip and tongue information we obtained the highest R-2 (0.712) and the lowest RMSE (37.562 WPM). Conclusion: The results revealed that our proposed analyses predicted the intelligible speaking rate of the participant with reasonably high accuracy by extracting the acoustic and/or articulatory features from one short speech sample. With further development, the analyses may be well-suited for clinical applications that require automatic speech severity prediction.
引用
收藏
页码:669 / 679
页数:11
相关论文
共 50 条
  • [21] Automatic Detection of Putative Mild Cognitive Impairment from Speech Acoustic Features in Mandarin-Speaking Elders
    Wang, Rumi
    Kuang, Chen
    Guo, Chengyu
    Chen, Yong
    Li, Canyang
    Matsumura, Yoshihiro
    Ishimaru, Masashi
    Van Pelt, Alice J.
    Chen, Fei
    JOURNAL OF ALZHEIMERS DISEASE, 2023, 95 (03) : 901 - 914
  • [22] Hierarchical Organization of Human Auditory Cortex: Evidence from Acoustic Invariance in the Response to Intelligible Speech
    Okada, Kayoko
    Rong, Feng
    Venezia, Jon
    Matchin, William
    Hsieh, I-Hui
    Saberi, Kourosh
    Serences, John T.
    Hickok, Gregory
    CEREBRAL CORTEX, 2010, 20 (10) : 2486 - 2495
  • [23] PAUSAL AND SPEECH DURATION CHARACTERISTICS AS A FUNCTION OF SPEAKING RATE IN NORMAL AND PARKINSONIAN DYSARTHRIC INDIVIDUALS
    HAMMEN, VL
    YORKSTON, KM
    BEUKELMAN, DR
    RECENT ADVANCES IN CLINICAL DYSARTHRIA, 1989, : 213 - 224
  • [24] Prediction of Speech Delay from Acoustic Measurements
    Lilley, Jason
    Ratnagiri, Madhavi
    Bunnell, H. Timothy
    18TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2017), VOLS 1-6: SITUATED INTERACTION, 2017, : 1859 - 1863
  • [25] Automatic discovery of topics and acoustic morphemes from speech
    Cerisara, Christophe
    COMPUTER SPEECH AND LANGUAGE, 2009, 23 (02): : 220 - 239
  • [26] A STATISTICAL APPROACH TO AUTOMATIC SPEECH RECOGNITION USING THE ATOMIC SPEECH UNITS CONSTRUCTED FROM OVERLAPPING ARTICULATORY FEATURES
    DENG, L
    SUN, DX
    JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA, 1994, 95 (05): : 2702 - 2719
  • [27] Maximum likelihood speaking rate normalization of speech signals for improving access to speech-enabled automatic systems
    Choi, Seung Ho
    Kim, Hong Kook
    ASIA LIFE SCIENCES, 2015, : 197 - 206
  • [28] Sparse smoothing of articulatory features from Gaussian mixture model based acoustic-to-articulatory inversion: Benefit to speech recognition
    Sudhakar, Prasad
    Ghosh, Prasanta Kumar
    15TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2014), VOLS 1-4, 2014, : 169 - 173
  • [29] The Performance of the Speaking Rate Parameter in Emotion Recognition from Speech
    Philippou-Huebner, David
    Vlasenko, Bogdan
    Boeck, Ronald
    Wendemuth, Andreas
    2012 IEEE INTERNATIONAL CONFERENCE ON MULTIMEDIA AND EXPO WORKSHOPS (ICMEW), 2012, : 296 - 301
  • [30] Automatic Early Detection of Amyotrophic Lateral Sclerosis from Intelligible Speech Using Convolutional Neural Networks
    An, KwangHoon
    Kim, Myungjong
    Teplansky, Kristin
    Green, Jordan R.
    Campbell, Thomas F.
    Yunusova, Yana
    Heitzman, Daragh
    Wang, Jun
    19TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2018), VOLS 1-6: SPEECH RESEARCH FOR EMERGING MARKETS IN MULTILINGUAL SOCIETIES, 2018, : 1913 - 1917