Comparative analysis of different time-frequency image representations for the detection and severity classification of dysarthric speech using deep learning

被引:0
|
作者
Aurobindo, S. [1 ]
Prakash, R. [1 ]
Rajeshkumar, M. [1 ]
机构
[1] Vellore Inst Technol, Sch Elect Engn, Vellore, Tamil Nadu, India
关键词
Speech features; Dysarthria; Deep convolutional neural network; Severity classification; Time-frequency image representation; INTELLIGIBILITY; RECOGNITION; FEATURES; PHASE;
D O I
10.1016/j.rineng.2025.104561
中图分类号
T [工业技术];
学科分类号
08 ;
摘要
Dysarthria, a speech disorder resulting from neurological damage, presents significant challenges in clinical diagnosis and assessment. Traditional methods of dysarthria detection are often time-consuming and require expert interpretation. This study analyzes various time-frequency image representations of TORGO dysarthric speech to facilitate the automatic detection and classification of dysarthria severity through deep convolutional neural networks (DCNN). The dysarthria detection problem was approached in experiment E1, a binary classification task involving dysarthria and a healthy control class. Experiment E2 employs a multiclass classification method, categorizing data into very low, low, medium, and healthy classes. The analysis of time- frequency image representations of speech features is presented in two forms: standard-form images, including cepstrogram and spectrogram, and compact-form images, such as cochleagram and mel-scalogram. The highest ranked feature is benchmarked with existing work for both dysarthria detection and its severity classification. And this proposed work analyzes the frequency behavior of time-frequency image representations by bifurcating the standard-form images into two halves: one half representing low frequency and the other half representing high frequency. By this approach, the bifurcated standard-form of cepstrogram with low frequency outperforms all other features by achieving a validation accuracy of 99.53% for E1 and 97.85% for E2 and this surpasses existing benchmark work by 6% for E1 and by 1.7% for E2 in the TORGO dataset. The best ranked feature of E1 and E2 was applied to the noise-reduced UASpeech dataset and the DCNN model achieved 98.75% accuracy in E1 and 95.98% in E2, demonstrating its effectiveness on new dataset.
引用
收藏
页数:14
相关论文
共 50 条
  • [41] Minimum classification error using time-frequency analysis
    Breakenridge, C
    Mesbah, M
    PROCEEDINGS OF THE 3RD IEEE INTERNATIONAL SYMPOSIUM ON SIGNAL PROCESSING AND INFORMATION TECHNOLOGY, 2003, : 717 - 720
  • [42] Optimizing time-frequency representations for signal classification using radially Gaussian kernels
    Honeine, Paul
    Richard, Cedric
    TRAITEMENT DU SIGNAL, 2008, 25 (06) : 469 - 479
  • [43] Road Type Classification Using Time-Frequency Representations of Tire Sensor Signals
    Dozsa, Tamas
    Jurdana, Vedran
    Segota, Sandi Baressi
    Volk, Janos
    Rado, Janos
    Soumelidis, Alexandros
    Kovacs, Peter
    IEEE ACCESS, 2024, 12 : 53361 - 53372
  • [44] Deep Learning in Time-Frequency Domain for Document Layout Analysis
    Grijalva, Felipe
    Santos, Erick
    Acuna, Byron
    Rodriguez, Juan Carlos
    Larco, Julio Cesar
    IEEE ACCESS, 2021, 9 : 151254 - 151265
  • [45] Heartbeats Classification Using Hybrid Time-Frequency Analysis and Transfer Learning Based on ResNet
    Zhang, Yatao
    Li, Junyan
    Wei, Shoushui
    Zhou, Fengyu
    Li, Dong
    IEEE JOURNAL OF BIOMEDICAL AND HEALTH INFORMATICS, 2021, 25 (11) : 4175 - 4184
  • [46] Time-Frequency Analysis using V-band Radar for Drone Detection and Classification
    Lam, Ian
    Pant, Shashank
    Manning, Max
    Kubanski, Michael
    Fox, Peter
    Rajan, Sreeraman
    Patnaik, Prakash
    Balaji, Bhashyam
    2023 IEEE INTERNATIONAL INSTRUMENTATION AND MEASUREMENT TECHNOLOGY CONFERENCE, I2MTC, 2023,
  • [47] Speech activity detection using time-frequency auditory spectral pattern
    Mondal, Sujoy
    Das Barman, Abhirup
    APPLIED ACOUSTICS, 2020, 167
  • [48] Polyp Image Detection and Classification Using Deep Learning
    Chen, Yao-Tien
    Ahmad, Nisar
    Liang, Jin-Wei
    2022 IEEE INTERNATIONAL CONFERENCE ON CONSUMER ELECTRONICS - TAIWAN, IEEE ICCE-TW 2022, 2022, : 455 - 456
  • [49] Speech presence detection in the time-frequency domain using minimum statistics
    Sorensen, KV
    Andersen, SV
    NORSIG 2004: PROCEEDINGS OF THE 6TH NORDIC SIGNAL PROCESSING SYMPOSIUM, 2004, 46 : 340 - 343
  • [50] EEG Error Potentials Detection and Classification using Time-Frequency Features for Robot Reinforcement Learning
    Boubchir, Larbi
    Touati, Youcef
    Daachi, Boubaker
    Cherif, Arab Ali
    2015 37TH ANNUAL INTERNATIONAL CONFERENCE OF THE IEEE ENGINEERING IN MEDICINE AND BIOLOGY SOCIETY (EMBC), 2015, : 1761 - 1764