Deep Learning Bidirectional LSTM based Detection of Prolongation and Repetition in Stuttered Speech using Weighted MFCC

被引：0

作者：

Gupta, Sakshi ^{[1
]}

Shukla, Ravi S. ^{[2
]}

Shukla, Rajesh K. ^{[1
]}

Verma, Rajesh ^{[3
]}

机构：

[1] Invertis Univ, Dept Comp Sci & Engn, Bareilly, Uttar Pradesh, India

[2] Saudi Elect Univ, Dept Comp Sci, Riyadh, Saudi Arabia

[3] King Khalid Univ, Dept Elect Engn, Abha, Saudi Arabia

来源：

INTERNATIONAL JOURNAL OF ADVANCED COMPUTER SCIENCE AND APPLICATIONS | 2020年 / 11卷 / 09期

关键词：

Speech; stuttering; deep learning; WMFCC; Bi-LSTM; CLASSIFICATION; DYSFLUENCIES; RECOGNITION;

D O I：

10.14569/IJACSA.2020.0110941

中图分类号：

TP301 [理论、方法];

学科分类号：

081202 ;

摘要：

Stuttering is a neuro-development disorder during which normal speech flow is not fluent. Traditionally Speech-Language Pathologists used to assess the extent of stuttering by counting the speech disfluencies manually. Such sorts of stuttering assessments are arbitrary, incoherent, lengthy, and error-prone. The present study focused on objective assessment to speech disfluencies such as prolongation and syllable, word, and phrase repetition. The proposed method is based on the Weighted Mel Frequency Cepstral Coefficient feature extraction algorithm and deep-learning Bidirectional Long-Short term Memory neural network for classification of stuttered events. The work has utilized the UCLASS stuttering dataset for analysis. The speech samples of the database are initially preprocessed, manually segmented, and labeled as a type of disfluency. The labeled speech samples are parameterized to Weighted MFCC feature vectors. Then extracted features are inputted to the Bidirectional-LSTM network for training and testing of the model. The effect of different hyper-parameters on classification results is examined. The test results show that the proposed method reaches the best accuracy of 96.67%, as compared to the LSTM model. The promising recognition accuracy of 97.33%, 98.67%, 97.5%, 97.19%, and 97.67% was achieved for the detection of fluent, prolongation, syllable, word, and phrase repetition, respectively.

引用

页码：345 / 356

页数：12

共 50 条

[31] RETRACTED: An Analytical Study of Speech Pathology Detection Based on MFCC and Deep Neural Networks (Retracted Article)
Zakariah, Mohammed
Reshma, B.
Alotaibi, Yousef Ajmi
Guo, Yanhui
Tran-Trung, Kiet
Elahi, Mohammad Mamun
COMPUTATIONAL AND MATHEMATICAL METHODS IN MEDICINE, 2022, 2022
[32] Sarcasm Detection Using Multi-Head Attention Based Bidirectional LSTM
Kumar, Avinash
Narapareddy, Vishnu Teja
Aditya Srikanth, Veerubhotla
Malapati, Aruna
Neti, Lalita Bhanu Murthy
IEEE ACCESS, 2020, 8 : 6388 - 6397
[33] A Financial Fraud Detection Model Based on LSTM Deep Learning Technique
Alghofaili, Yara
Albattah, Albatul
Rassam, Murad A.
JOURNAL OF APPLIED SECURITY RESEARCH, 2020, 15 (04) : 498 - 516
[34] DeepMTT: A deep learning maneuvering target-tracking algorithm based on bidirectional LSTM network
Liu, Jingxian
Wang, Zulin
Xu, Mai
INFORMATION FUSION, 2020, 53 : 289 - 304
[35] Depression detection using cascaded attention based deep learning framework using speech data
Gupta, Sachi
Agarwal, Gaurav
Agarwal, Shivani
Pandey, Dilkeshwar
MULTIMEDIA TOOLS AND APPLICATIONS, 2024, 83 (25) : 66135 - 66173
[36] Hybrid Deep Learning Approach Based on LSTM and CNN for Malware Detection
Thakur, Preeti
Kansal, Vineet
Rishiwal, Vinay
WIRELESS PERSONAL COMMUNICATIONS, 2024, 136 (03) : 1879 - 1901
[37] Short-Term Load Forecasting Based on Deep Learning Bidirectional LSTM Neural Network
Cai, Changchun
Tao, Yuan
Zhu, Tianqi
Deng, Zhixiang
APPLIED SCIENCES-BASEL, 2021, 11 (17):
[38] Deep Learning-Based Forecasting Approach in Smart Grids With Microclustering and Bidirectional LSTM Network
Jahangir, Hamidreza
Tayarani, Hanif
Gougheri, Saleh Sadeghi
Golkar, Masoud Aliakbar
Ahmadian, Ali
Elkamel, Ali
IEEE TRANSACTIONS ON INDUSTRIAL ELECTRONICS, 2021, 68 (09) : 8298 - 8309
[39] CROSS-LINGUAL SPEECH-BASED TOBI LABEL GENERATION USING BIDIRECTIONAL LSTM
Vetter, Marco
Sakti, Sakriani
Nakamura, Satoshi
2019 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2019, : 6620 - 6624
[40] MULTIPLE-TARGET DEEP LEARNING FOR LSTM-RNN BASED SPEECH ENHANCEMENT
Sun, Lei
Du, Jun
Dai, Li-Rong
Lee, Chin-Hui
2017 HANDS-FREE SPEECH COMMUNICATIONS AND MICROPHONE ARRAYS (HSCMA 2017), 2017, : 136 - 140

← 1 2 3 4 5 →