Comparative analysis of different time-frequency image representations for the detection and severity classification of dysarthric speech using deep learning

被引:0
|
作者
Aurobindo, S. [1 ]
Prakash, R. [1 ]
Rajeshkumar, M. [1 ]
机构
[1] Vellore Inst Technol, Sch Elect Engn, Vellore, Tamil Nadu, India
关键词
Speech features; Dysarthria; Deep convolutional neural network; Severity classification; Time-frequency image representation; INTELLIGIBILITY; RECOGNITION; FEATURES; PHASE;
D O I
10.1016/j.rineng.2025.104561
中图分类号
T [工业技术];
学科分类号
08 ;
摘要
Dysarthria, a speech disorder resulting from neurological damage, presents significant challenges in clinical diagnosis and assessment. Traditional methods of dysarthria detection are often time-consuming and require expert interpretation. This study analyzes various time-frequency image representations of TORGO dysarthric speech to facilitate the automatic detection and classification of dysarthria severity through deep convolutional neural networks (DCNN). The dysarthria detection problem was approached in experiment E1, a binary classification task involving dysarthria and a healthy control class. Experiment E2 employs a multiclass classification method, categorizing data into very low, low, medium, and healthy classes. The analysis of time- frequency image representations of speech features is presented in two forms: standard-form images, including cepstrogram and spectrogram, and compact-form images, such as cochleagram and mel-scalogram. The highest ranked feature is benchmarked with existing work for both dysarthria detection and its severity classification. And this proposed work analyzes the frequency behavior of time-frequency image representations by bifurcating the standard-form images into two halves: one half representing low frequency and the other half representing high frequency. By this approach, the bifurcated standard-form of cepstrogram with low frequency outperforms all other features by achieving a validation accuracy of 99.53% for E1 and 97.85% for E2 and this surpasses existing benchmark work by 6% for E1 and by 1.7% for E2 in the TORGO dataset. The best ranked feature of E1 and E2 was applied to the noise-reduced UASpeech dataset and the DCNN model achieved 98.75% accuracy in E1 and 95.98% in E2, demonstrating its effectiveness on new dataset.
引用
收藏
页数:14
相关论文
共 50 条
  • [21] Fusion of Image Representations for Time Series Classification with Deep Learning
    Costa, Henrique, V
    Ribeiro, Andre G. R.
    Souza, Vinicius M. A.
    ARTIFICIAL NEURAL NETWORKS AND MACHINE LEARNING-ICANN 2024, PT VI, 2024, 15021 : 235 - 250
  • [22] Deep Learning Enabled Fault Diagnosis Using Time-Frequency Image Analysis of Rolling Element Bearings
    Verstraete, David
    Ferrada, Andres
    Lopez Droguett, Enrique
    Meruane, Viviana
    Modarres, Mohammad
    SHOCK AND VIBRATION, 2017, 2017
  • [23] ON THE DISJOINTESS OF SOURCES IN MUSIC USING DIFFERENT TIME-FREQUENCY REPRESENTATIONS
    Giannoulis, Dimitrios
    Barchiesi, Daniele
    Klapuri, Anssi
    Plumbley, Mark D.
    2011 IEEE WORKSHOP ON APPLICATIONS OF SIGNAL PROCESSING TO AUDIO AND ACOUSTICS (WASPAA), 2011, : 261 - 264
  • [24] Helicopter classification using time-frequency analysis
    Yoon, SH
    Kim, B
    Kim, YS
    ELECTRONICS LETTERS, 2000, 36 (22) : 1871 - 1872
  • [25] Deep Learning Assisted Time-Frequency Processing for Speech Enhancement on Drones
    Wang, Lin
    Cavallaro, Andrea
    IEEE TRANSACTIONS ON EMERGING TOPICS IN COMPUTATIONAL INTELLIGENCE, 2021, 5 (06): : 871 - 881
  • [26] Microwave breast cancer detection using time-frequency representations
    Hongchao Song
    Yunpeng Li
    Aidong Men
    Medical & Biological Engineering & Computing, 2018, 56 : 571 - 582
  • [27] Microwave breast cancer detection using time-frequency representations
    Song, Hongchao
    Li, Yunpeng
    Men, Aidong
    MEDICAL & BIOLOGICAL ENGINEERING & COMPUTING, 2018, 56 (04) : 571 - 582
  • [28] A distinguished deep learning method for gear fault classification using time-frequency representation
    Nguyen, Trong-Du
    Nguyen, Huu-Cuong
    Pham, Duong-Hung
    Nguyen, Phong-Dien
    DISCOVER APPLIED SCIENCES, 2024, 6 (07)
  • [29] Continuous robust sound event classification using time-frequency features and deep learning
    McLoughlin, Ian
    Zhang, Haomin
    Xie, Zhipeng
    Song, Yan
    Xiao, Wei
    Phan, Huy
    PLOS ONE, 2017, 12 (09):
  • [30] Human gait classification using MicroDoppler time-frequency signal representations
    Lyonnet, Bastien
    Ioana, Cornel
    Amin, Moeness G.
    2010 IEEE RADAR CONFERENCE, 2010, : 915 - 919