Comparative analysis of different time-frequency image representations for the detection and severity classification of dysarthric speech using deep learning

被引：0

作者：

Aurobindo, S. ^{[1
]}

Prakash, R. ^{[1
]}

Rajeshkumar, M. ^{[1
]}

机构：

[1] Vellore Inst Technol, Sch Elect Engn, Vellore, Tamil Nadu, India

来源：

RESULTS IN ENGINEERING | 2025年 / 25卷

关键词：

Speech features; Dysarthria; Deep convolutional neural network; Severity classification; Time-frequency image representation; INTELLIGIBILITY; RECOGNITION; FEATURES; PHASE;

D O I：

10.1016/j.rineng.2025.104561

中图分类号：

T [工业技术];

学科分类号：

08 ;

摘要：

Dysarthria, a speech disorder resulting from neurological damage, presents significant challenges in clinical diagnosis and assessment. Traditional methods of dysarthria detection are often time-consuming and require expert interpretation. This study analyzes various time-frequency image representations of TORGO dysarthric speech to facilitate the automatic detection and classification of dysarthria severity through deep convolutional neural networks (DCNN). The dysarthria detection problem was approached in experiment E1, a binary classification task involving dysarthria and a healthy control class. Experiment E2 employs a multiclass classification method, categorizing data into very low, low, medium, and healthy classes. The analysis of time- frequency image representations of speech features is presented in two forms: standard-form images, including cepstrogram and spectrogram, and compact-form images, such as cochleagram and mel-scalogram. The highest ranked feature is benchmarked with existing work for both dysarthria detection and its severity classification. And this proposed work analyzes the frequency behavior of time-frequency image representations by bifurcating the standard-form images into two halves: one half representing low frequency and the other half representing high frequency. By this approach, the bifurcated standard-form of cepstrogram with low frequency outperforms all other features by achieving a validation accuracy of 99.53% for E1 and 97.85% for E2 and this surpasses existing benchmark work by 6% for E1 and by 1.7% for E2 in the TORGO dataset. The best ranked feature of E1 and E2 was applied to the noise-reduced UASpeech dataset and the DCNN model achieved 98.75% accuracy in E1 and 95.98% in E2, demonstrating its effectiveness on new dataset.

引用

页数：14

共 50 条

[1] Investigation of Different Time-Frequency Representations for Intelligibility Assessment of Dysarthric Speech
Chandrashekar, H. M.
Karjigi, Veena
Sreedevi, N.
IEEE TRANSACTIONS ON NEURAL SYSTEMS AND REHABILITATION ENGINEERING, 2020, 28 (12) : 2880 - 2889
[2] Electromyography Signal Analysis and Classification using Time-Frequency Representations and Deep Learning
Elbeshbeshy, Ahmed M.
Rushdi, Muhammad A.
El-Metwally, Shereen M.
2021 43RD ANNUAL INTERNATIONAL CONFERENCE OF THE IEEE ENGINEERING IN MEDICINE & BIOLOGY SOCIETY (EMBC), 2021, : 661 - 664
[3] Comparative analysis of deep learning models for dysarthric speech detection
P. Shanmugapriya
V. Mohan
Soft Computing, 2024, 28 : 5683 - 5698
[4] Comparative analysis of deep learning models for dysarthric speech detection
Shanmugapriya, P.
Mohan, V.
SOFT COMPUTING, 2024, 28 (06) : 5683 - 5698
[5] Preprocessing Selection for Deep Learning Classification of Arrhythmia Using ECG Time-Frequency Representations
Holanda, Rafael
Monteiro, Rodrigo
Bastos-Filho, Carmelo
TECHNOLOGIES, 2023, 11 (03)
[6] Fault Detection and Classification for Sensor Faults of UAV by Deep Learning and Time-Frequency Analysis
Huang, Jing
Li, Mengna
Zhang, Youmin
Mu, Lingxia
Ao, Zihang
Gong, Haihua
2021 PROCEEDINGS OF THE 40TH CHINESE CONTROL CONFERENCE (CCC), 2021, : 4420 - 4424
[7] Image processing for time-frequency speech analysis
Benyoucef M.
International Journal of Speech Technology, 2008, 11 (1) : 43 - 49
[8] Detection and classification of defects in ultrasonic NDE signals using time-frequency representations
Qidwai, U
Costa, AH
Chen, CH
REVIEW OF PROGRESS IN QUANTITATIVE NONDESTRUCTIVE EVALUATION, VOLS 19A AND 19B, 2000, 509 : 717 - 724
[9] Epileptic EEG Classification by Using Time-Frequency Images for Deep Learning
Ozdemir, Mehmet Akif
Cura, Ozlem Karabiber
Akan, Aydin
INTERNATIONAL JOURNAL OF NEURAL SYSTEMS, 2021, 31 (08)
[10] Overlapped Speech Detection Using AM-FM Based Time-Frequency Representations
Baghel, Shikha
Prasanna, S. R. M.
Guha, Prithwijit
SPEECH AND COMPUTER, SPECOM 2022, 2022, 13721 : 33 - 43

← 1 2 3 4 5 →