Comparative analysis of different time-frequency image representations for the detection and severity classification of dysarthric speech using deep learning

被引:0
|
作者
Aurobindo, S. [1 ]
Prakash, R. [1 ]
Rajeshkumar, M. [1 ]
机构
[1] Vellore Inst Technol, Sch Elect Engn, Vellore, Tamil Nadu, India
关键词
Speech features; Dysarthria; Deep convolutional neural network; Severity classification; Time-frequency image representation; INTELLIGIBILITY; RECOGNITION; FEATURES; PHASE;
D O I
10.1016/j.rineng.2025.104561
中图分类号
T [工业技术];
学科分类号
08 ;
摘要
Dysarthria, a speech disorder resulting from neurological damage, presents significant challenges in clinical diagnosis and assessment. Traditional methods of dysarthria detection are often time-consuming and require expert interpretation. This study analyzes various time-frequency image representations of TORGO dysarthric speech to facilitate the automatic detection and classification of dysarthria severity through deep convolutional neural networks (DCNN). The dysarthria detection problem was approached in experiment E1, a binary classification task involving dysarthria and a healthy control class. Experiment E2 employs a multiclass classification method, categorizing data into very low, low, medium, and healthy classes. The analysis of time- frequency image representations of speech features is presented in two forms: standard-form images, including cepstrogram and spectrogram, and compact-form images, such as cochleagram and mel-scalogram. The highest ranked feature is benchmarked with existing work for both dysarthria detection and its severity classification. And this proposed work analyzes the frequency behavior of time-frequency image representations by bifurcating the standard-form images into two halves: one half representing low frequency and the other half representing high frequency. By this approach, the bifurcated standard-form of cepstrogram with low frequency outperforms all other features by achieving a validation accuracy of 99.53% for E1 and 97.85% for E2 and this surpasses existing benchmark work by 6% for E1 and by 1.7% for E2 in the TORGO dataset. The best ranked feature of E1 and E2 was applied to the noise-reduced UASpeech dataset and the DCNN model achieved 98.75% accuracy in E1 and 95.98% in E2, demonstrating its effectiveness on new dataset.
引用
收藏
页数:14
相关论文
共 50 条
  • [31] Image classification-driven speech disorder detection using deep learning technique
    Aljarallah, Nasser Ali
    Dutta, Ashit Kumar
    Sait, Abdul Rahaman Wahab
    SLAS TECHNOLOGY, 2025, 32
  • [32] Deep Learning of EMG Time-Frequency Representations for Identifying Normal and Aggressive Actions
    Alaskar, Haya
    INTERNATIONAL JOURNAL OF COMPUTER SCIENCE AND NETWORK SECURITY, 2018, 18 (12): : 16 - 25
  • [33] Prediction of the quality ratings of tracheoespohageal speech using adaptive time-frequency representations
    McDonald, Rob
    Parsa, Vijay
    Doyle, Phillip
    2008 CANADIAN CONFERENCE ON ELECTRICAL AND COMPUTER ENGINEERING, VOLS 1-4, 2008, : 1641 - +
  • [34] Detection and classification of power quality disturbances using time-frequency analysis technique
    Abdullah, Abdul Rahim
    Sha'ameri, Ahmad Zuri
    Sidek, Abd Rahim Mat
    Shaari, Mohammad Razman
    2007 5TH STUDENT CONFERENCE ON RESEARCH AND DEVELOPMENT, 2007, : 88 - +
  • [35] EEG analysis of Parkinson?s disease using time-frequency analysis and deep learning
    Zhang, Ruilin
    Jia, Jian
    Zhang, Rui
    BIOMEDICAL SIGNAL PROCESSING AND CONTROL, 2022, 78
  • [36] Optimal detection using bilinear time-frequency and time-scale representations
    Sayeed, AM
    Jones, DL
    IEEE TRANSACTIONS ON SIGNAL PROCESSING, 1995, 43 (12) : 2872 - 2883
  • [37] A deep learning approach for rapid detection of soil liquefaction using time-frequency images
    Zhang, W.
    Ghahari, F.
    Arduino, P.
    Taciroglu, E.
    SOIL DYNAMICS AND EARTHQUAKE ENGINEERING, 2023, 166
  • [38] A Deep Learning-Based Time-Frequency Scheme for Ship Detection Using HFSWR
    Huang, Da
    Zhou, Hao
    Tian, Yingwei
    Yang, Zhiqing
    Huang, Weimin
    IEEE JOURNAL OF SELECTED TOPICS IN APPLIED EARTH OBSERVATIONS AND REMOTE SENSING, 2025, 18 : 2718 - 2736
  • [39] Analysis of heart sounds using time-frequency visual representations
    Seshadri, N. P. Guhan
    Geethanjali, B.
    Kumar, S. Pravin
    INTERNATIONAL JOURNAL OF BIOMEDICAL ENGINEERING AND TECHNOLOGY, 2016, 21 (03) : 205 - 228
  • [40] Implementation of a Deep Learning Algorithm Based on Vertical Ground Reaction Force Time-Frequency Features for the Detection and Severity Classification of Parkinson's Disease
    Setiawan, Febryan
    Lin, Che-Wei
    SENSORS, 2021, 21 (15)