Amharic spoken digits recognition using convolutional neural network

被引:0
|
作者
Ayall, Tewodros Alemu [1 ,4 ]
Zhou, Changjun [1 ]
Liu, Huawen [2 ]
Brhanemeskel, Getnet Mezgebu [3 ]
Abate, Solomon Teferra [3 ]
Adjeisah, Michael [1 ]
机构
[1] Zhejiang Normal Univ, Sch Comp Sci & Technol, Jinhua, Peoples R China
[2] Shaoxing Univ, Dept Comp Sci, Shaoxing, Peoples R China
[3] Addis Ababa Univ, Sch Informat Sci, Addis Ababa, Ethiopia
[4] Univ Aberdeen, Interdisciplinary Ctr Data & AI, Sch Nat & Comp Sci, Aberdeen AB24 3UE, Scotland
关键词
Automatic speech recognition; Spoken digit recognition; Amharic spoken digits recognition; Convolutional neural network; Speech feature extraction; SPEECH RECOGNITION;
D O I
10.1186/s40537-024-00910-z
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
Spoken digits recognition (SDR) is a type of supervised automatic speech recognition, which is required in various human-machine interaction applications. It is utilized in phone-based services like dialing systems, certain bank operations, airline reservation systems, and price extraction. However, the design of SDR is a challenging task that requires the development of labeled audio data, the proper choice of feature extraction method, and the development of the best performing model. Even if several works have been done for various languages, such as English, Arabic, Urdu, etc., there is no developed Amharic spoken digits dataset (AmSDD) to build Amharic spoken digits recognition (AmSDR) model for the Amharic language, which is the official working language of the government of Ethiopia. Therefore, in this study, we developed a new AmSDD that contains 12,000 utterances of 0 (Zaero) to 9 (zet'enyi) digits which were recorded from 120 volunteer speakers of different age groups, genders, and dialects who repeated each digit ten times. Mel frequency cepstral coefficients (MFCCs) and Mel-Spectrogram feature extraction methods were used to extract trainable features from the speech signal. We conducted different experiments on the development of the AmSDR model using the AmSDD and classical supervised learning algorithms such as Linear Discriminant Analysis (LDA), K-Nearest Neighbors (KNN), Support Vector Machine (SVM), and Random Forest (RF) as the baseline. To further improve the performance recognition of AmSDR, we propose a three layers Convolutional Neural Network (CNN) architecture with Batch normalization. The results of our experiments show that the proposed CNN model outperforms the baseline algorithms and scores an accuracy of 99% and 98% using MFCCs and Mel-Spectrogram features, respectively.
引用
收藏
页数:23
相关论文
共 50 条
  • [1] Amharic Character Recognition Using Deep Convolutional Neural Network
    Aynalem, Achamie
    [J]. ARTIFICIAL INTELLIGENCE AND DIGITALIZATION FOR SUSTAINABLE DEVELOPMENT, ICAST 2022, 2023, 455 : 177 - 195
  • [2] Pashto isolated digits recognition using deep convolutional neural network
    Zada, Bakht
    Ullah, Rahim
    [J]. HELIYON, 2020, 6 (02)
  • [3] Urdu spoken digits recognition using classified MFCC and backpropgation neural network
    Azam, S. M.
    Mansoor, Z. A.
    Mughal, M. Shahzad
    Mohsin, S.
    [J]. COMPUTER GRAPHICS, IMAGING AND VISUALISATION: NEW ADVANCES, 2007, : 414 - 418
  • [4] Recognition of Bengali Handwritten Digits Using Convolutional Neural Network Architectures
    Hasan, Md Mahmudul
    Ul Islam, Md Rafid
    Mahmood, Md Tareq
    [J]. 2018 INTERNATIONAL CONFERENCE ON BANGLA SPEECH AND LANGUAGE PROCESSING (ICBSLP), 2018,
  • [5] FACTORED CONVOLUTIONAL NEURAL NETWORK FOR AMHARIC CHARACTER IMAGE RECOGNITION
    Belay, Birhanu
    Habtegebrial, Tewodros
    Liwicki, Marcus
    Belay, Gebeyehu
    Stricker, Didier
    [J]. 2019 IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING (ICIP), 2019, : 2906 - 2910
  • [6] Digits Recognition with Quadrant Photodiode and Convolutional Neural Network
    Janczyk, Kamil
    Czuszynski, Krzysztof
    Ruminski, Jacek
    [J]. 2018 11TH INTERNATIONAL CONFERENCE ON HUMAN SYSTEM INTERACTION (HSI), 2018, : 111 - 117
  • [7] Telugu Spoken Digits Modeling using Convolutional Neural Networks
    Bhagath, Parabattina
    Rao, A. Uma Maheswara
    Ram, B. Sai
    Reddy, M. Anil Kumar
    [J]. 2023 IEEE 13TH INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION SYSTEMS, ICPRS, 2023,
  • [8] Handwritten Hindi Digits Recognition Using Convolutional Neural Network with RMSprop Optimization
    Reddy, R. Vijaya Kumar
    Rao, B. Srinivasa
    Raju, Prudvi
    [J]. PROCEEDINGS OF THE 2018 SECOND INTERNATIONAL CONFERENCE ON INTELLIGENT COMPUTING AND CONTROL SYSTEMS (ICICCS), 2018, : 45 - 51
  • [9] AHWR-Net: offline handwritten amharic word recognition using convolutional recurrent neural network
    Abdurahman, Fetulhak
    Sisay, Eyob
    Fante, Kinde Anlay
    [J]. SN APPLIED SCIENCES, 2021, 3 (08):
  • [10] AHWR-Net: offline handwritten amharic word recognition using convolutional recurrent neural network
    Fetulhak Abdurahman
    Eyob Sisay
    Kinde Anlay Fante
    [J]. SN Applied Sciences, 2021, 3