Spoken Arabic Digits Recognition Using Deep Learning

被引:0
|
作者
Wazir, Abdulaziz Saleh Mahfoudh B. A. [1 ]
Chuah, Joon Huang [1 ]
机构
[1] Univ Malaya, VIP Res Lab, Dept Elect Engn, Fac Engn, Kuala Lumpur 50603, Malaysia
关键词
Arabic digits; speech recognition; Deep Learning; Recurrent Neural Network (RNN); Long Short-Term Memory (LSTM);
D O I
10.1109/i2cacis.2019.8825004
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Speech recognition has undergone tremendous advancement over the past 50 years. Deep Neural Network (DNN) is one of the most popular methods for speech analysis thanks to its ability to minimize error rate for optimization problems. This research proposes an Arabic digits speech recognition model utilizing Recurrent Neural Network (RNN). The speech recognition model select the finest speech signal representation by feature extraction of Mel-Frequency Cepstrum Coefficients (MFCCs) after having been processed for noise reduction and digits separation. Extracted features from speech of digit are fed into a network with Long Short-Term Memory (LSTM) cells. The LSTM cells have the capability to solve problems associated with temporal dependencies requiring long-term learning and solve the vanishing gradient problems associated with RNN. A dataset of 1040 samples of spoken Arabic digits from different dialects are used in this study where 840 samples used to train the network and another 200 samples are used for testing purpose. The model training is carried out using a computing system with Graphics Processing Unit (GPU). The LSTM model learning parameters is tuned for optimization purpose achieving a higher accuracy of 94% during model training. The testing results of the tuned parameters model shows that the LSTM model can achieve 69% in accuracy when recognizing spoken Arabic digits. The model has the highest accuracy, i.e. 80%, when recognizing the digit zero.
引用
收藏
页码:339 / 344
页数:6
相关论文
共 50 条
  • [1] Spoken Arabic Digits Recognition Using Discrete Wavelet
    Elrgaby, Mohammed
    Amoura, Abdwahab
    Ganoun, Ali
    [J]. 2014 UKSIM-AMSS 16TH INTERNATIONAL CONFERENCE ON COMPUTER MODELLING AND SIMULATION (UKSIM), 2014, : 275 - 279
  • [2] Spoken Arabic Digits recognition Using MFCC based on GMM
    Hammami, N.
    Bedda, M.
    Farah, N.
    [J]. 2012 IEEE CONFERENCE ON SUSTAINABLE UTILIZATION AND DEVELOPMENT IN ENGINEERING AND TECHNOLOGY (STUDENT), 2012, : 160 - 163
  • [3] Investigating spoken Arabic digits in speech recognition setting
    Alotaibi, YA
    [J]. INFORMATION SCIENCES, 2005, 173 (1-3) : 115 - 139
  • [4] Performance Analysis of Spoken Arabic Digits Recognition Techniques
    Ali Ganoun
    Ibrahim Almerhag
    [J]. Journal of Electronic Science and Technology, 2012, 10 (02) : 153 - 157
  • [5] Performance Analysis of Spoken Arabic Digits Recognition Techniques
    Ali Ganoun
    Ibrahim Almerhag
    [J]. Journal of Electronic Science and Technology, 2012, (02) : 153 - 157
  • [6] Spoken Arabic Digits Recognition Based on Wavelet Neural Networks
    Hu, Xiaohui
    Zhan, Lvjun
    Xue, Yun
    Zhou, Weixing
    Zhang, Liangjun
    [J]. 2011 IEEE INTERNATIONAL CONFERENCE ON SYSTEMS, MAN, AND CYBERNETICS (SMC), 2011, : 1481 - 1485
  • [7] Recognition of Arabic Accents From English Spoken Speech Using Deep Learning Approach
    Habbash, Mansoor
    Mnasri, Sami
    Alghamdi, Mansoor
    Alrashidi, Malek
    Tarawneh, Ahmad S.
    Gumair, Abdullah
    Hassanat, Ahmad B.
    [J]. IEEE ACCESS, 2024, 12 : 37219 - 37230
  • [8] Spoken Emotion Recognition Using Deep Learning
    Albornoz, E. M.
    Sanchez-Gutierrez, M.
    Martinez-Licona, F.
    Rufiner, H. L.
    Goddard, J.
    [J]. PROGRESS IN PATTERN RECOGNITION IMAGE ANALYSIS, COMPUTER VISION, AND APPLICATIONS, CIARP 2014, 2014, 8827 : 104 - 111
  • [9] Expiry Date Digits Recognition using Deep Learning
    Khan, Tareq
    [J]. PROCEEDINGS OF THE 2019 IEEE NATIONAL AEROSPACE AND ELECTRONICS CONFERENCE (NAECON), 2019, : 302 - 304
  • [10] Spoken arabic digits recognizer using recurrent neural networks
    Alotaibi, YA
    [J]. PROCEEDINGS OF THE FOURTH IEEE INTERNATIONAL SYMPOSIUM ON SIGNAL PROCESSING AND INFORMATION TECHNOLOGY, 2004, : 195 - 199