Deep Learning Bidirectional LSTM based Detection of Prolongation and Repetition in Stuttered Speech using Weighted MFCC

被引:0
|
作者
Gupta, Sakshi [1 ]
Shukla, Ravi S. [2 ]
Shukla, Rajesh K. [1 ]
Verma, Rajesh [3 ]
机构
[1] Invertis Univ, Dept Comp Sci & Engn, Bareilly, Uttar Pradesh, India
[2] Saudi Elect Univ, Dept Comp Sci, Riyadh, Saudi Arabia
[3] King Khalid Univ, Dept Elect Engn, Abha, Saudi Arabia
关键词
Speech; stuttering; deep learning; WMFCC; Bi-LSTM; CLASSIFICATION; DYSFLUENCIES; RECOGNITION;
D O I
10.14569/IJACSA.2020.0110941
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
Stuttering is a neuro-development disorder during which normal speech flow is not fluent. Traditionally Speech-Language Pathologists used to assess the extent of stuttering by counting the speech disfluencies manually. Such sorts of stuttering assessments are arbitrary, incoherent, lengthy, and error-prone. The present study focused on objective assessment to speech disfluencies such as prolongation and syllable, word, and phrase repetition. The proposed method is based on the Weighted Mel Frequency Cepstral Coefficient feature extraction algorithm and deep-learning Bidirectional Long-Short term Memory neural network for classification of stuttered events. The work has utilized the UCLASS stuttering dataset for analysis. The speech samples of the database are initially preprocessed, manually segmented, and labeled as a type of disfluency. The labeled speech samples are parameterized to Weighted MFCC feature vectors. Then extracted features are inputted to the Bidirectional-LSTM network for training and testing of the model. The effect of different hyper-parameters on classification results is examined. The test results show that the proposed method reaches the best accuracy of 96.67%, as compared to the LSTM model. The promising recognition accuracy of 97.33%, 98.67%, 97.5%, 97.19%, and 97.67% was achieved for the detection of fluent, prolongation, syllable, word, and phrase repetition, respectively.
引用
收藏
页码:345 / 356
页数:12
相关论文
共 50 条
  • [31] RETRACTED: An Analytical Study of Speech Pathology Detection Based on MFCC and Deep Neural Networks (Retracted Article)
    Zakariah, Mohammed
    Reshma, B.
    Alotaibi, Yousef Ajmi
    Guo, Yanhui
    Tran-Trung, Kiet
    Elahi, Mohammad Mamun
    COMPUTATIONAL AND MATHEMATICAL METHODS IN MEDICINE, 2022, 2022
  • [32] Sarcasm Detection Using Multi-Head Attention Based Bidirectional LSTM
    Kumar, Avinash
    Narapareddy, Vishnu Teja
    Aditya Srikanth, Veerubhotla
    Malapati, Aruna
    Neti, Lalita Bhanu Murthy
    IEEE ACCESS, 2020, 8 : 6388 - 6397
  • [33] A Financial Fraud Detection Model Based on LSTM Deep Learning Technique
    Alghofaili, Yara
    Albattah, Albatul
    Rassam, Murad A.
    JOURNAL OF APPLIED SECURITY RESEARCH, 2020, 15 (04) : 498 - 516
  • [34] DeepMTT: A deep learning maneuvering target-tracking algorithm based on bidirectional LSTM network
    Liu, Jingxian
    Wang, Zulin
    Xu, Mai
    INFORMATION FUSION, 2020, 53 : 289 - 304
  • [35] Depression detection using cascaded attention based deep learning framework using speech data
    Gupta, Sachi
    Agarwal, Gaurav
    Agarwal, Shivani
    Pandey, Dilkeshwar
    MULTIMEDIA TOOLS AND APPLICATIONS, 2024, 83 (25) : 66135 - 66173
  • [36] Hybrid Deep Learning Approach Based on LSTM and CNN for Malware Detection
    Thakur, Preeti
    Kansal, Vineet
    Rishiwal, Vinay
    WIRELESS PERSONAL COMMUNICATIONS, 2024, 136 (03) : 1879 - 1901
  • [37] Short-Term Load Forecasting Based on Deep Learning Bidirectional LSTM Neural Network
    Cai, Changchun
    Tao, Yuan
    Zhu, Tianqi
    Deng, Zhixiang
    APPLIED SCIENCES-BASEL, 2021, 11 (17):
  • [38] Deep Learning-Based Forecasting Approach in Smart Grids With Microclustering and Bidirectional LSTM Network
    Jahangir, Hamidreza
    Tayarani, Hanif
    Gougheri, Saleh Sadeghi
    Golkar, Masoud Aliakbar
    Ahmadian, Ali
    Elkamel, Ali
    IEEE TRANSACTIONS ON INDUSTRIAL ELECTRONICS, 2021, 68 (09) : 8298 - 8309
  • [39] CROSS-LINGUAL SPEECH-BASED TOBI LABEL GENERATION USING BIDIRECTIONAL LSTM
    Vetter, Marco
    Sakti, Sakriani
    Nakamura, Satoshi
    2019 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2019, : 6620 - 6624
  • [40] MULTIPLE-TARGET DEEP LEARNING FOR LSTM-RNN BASED SPEECH ENHANCEMENT
    Sun, Lei
    Du, Jun
    Dai, Li-Rong
    Lee, Chin-Hui
    2017 HANDS-FREE SPEECH COMMUNICATIONS AND MICROPHONE ARRAYS (HSCMA 2017), 2017, : 136 - 140