Hybrid deep learning based automatic speech recognition model for recognizing non-Indian languages

被引:0
|
作者
Gupta, Astha [1 ]
Kumar, Rakesh [1 ]
Kumar, Yogesh [2 ]
机构
[1] Chandigarh Univ, Dept Comp Sci & Engn, Mohali, Punjab, India
[2] Indus Univ, Indus Inst Technol & Engn, Ahmadabad, Gujarat, India
关键词
Automatic Speech Recognition; Spectrogram; Short Term Fourier transform; MFCC; ResNet10; Inception V3; VGG16; DenseNet201; EfficientNetB0;
D O I
10.1007/s11042-023-16748-1
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Speech is a natural phenomenon and a significant mode of communication used by humans that is divided into two categories, human-to-human and human-to-machine. Human-to-human communication depends on the language the speaker uses. In contrast, human-to-machine communication is a technique in which machines recognize human speech and act accordingly, often termed Automatic Speech Recognition (ASR). Recognition of Non-Indian language is challenging due to pitch variations and other factors such as accent, pronunciation, etc. This paper proposes a novel approach based on Dense Net201 and EfficientNetB0, i.e., a hybrid model for the recognition of Speech. Initially, 76,263 speech samples are taken from 11 non-Indian languages, including Chinese, Dutch, Finnish, French, German, Greek, Hungarian, Japanese, Russian, Spanish and Persian. When collected, these speech samples are pre-processed by removing noise. Then, Spectrogram, Short-Term Fourier Transform (STFT), Spectral Rolloff-Bandwidth, Mel-frequency Cepstral Coefficient (MFCC), and Chroma feature are used to extract features from the speech sample. Further, a comparative analysis of the proposed approach is shown with other Deep Learning (DL) models like ResNet10, Inception V3, VGG16, DenseNet201, and EfficientNetB0. Standard parameters like Precision, Recall, F1-Score, Confusion Matrix, Accuracy, and Loss curves are used to evaluate the performance of each model by considering speech samples from all the languages mentioned above. Thus, the experimental results show that the hybrid model stands out from all the other models by giving the highest recognition accuracy of 99.84% with a loss of 0.004%.
引用
收藏
页码:30145 / 30166
页数:22
相关论文
共 50 条
  • [41] Attention based hybrid deep learning model for wearable based stress recognition
    Tanwar, Ritu
    Phukan, Orchid Chetia
    Singh, Ghanapriya
    Pal, Pankaj Kumar
    Tiwari, Sanju
    ENGINEERING APPLICATIONS OF ARTIFICIAL INTELLIGENCE, 2024, 127
  • [42] JOINT ACOUSTIC FACTOR LEARNING FOR ROBUST DEEP NEURAL NETWORK BASED AUTOMATIC SPEECH RECOGNITION
    Kundu, Souvik
    Mantena, Gautam
    Qian, Yanmin
    Tan, Tian
    Delcroix, Marc
    Sim, Khe Chai
    2016 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING PROCEEDINGS, 2016, : 5025 - 5029
  • [43] DISCRIMINATIVE PIECEWISE LINEAR TRANSFORMATION BASED ON DEEP LEARNING FOR NOISE ROBUST AUTOMATIC SPEECH RECOGNITION
    Kashiwagi, Yosuke
    Saito, Daisuke
    Minematsu, Nobuaki
    Hirose, Keikichi
    2013 IEEE WORKSHOP ON AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING (ASRU), 2013, : 350 - 355
  • [44] The Efficacy of Deep Learning-Based Mixed Model for Speech Emotion Recognition
    Uddin, Mohammad Amaz
    Chowdury, Mohammad Salah Uddin
    Khandaker, Mayeen Uddin
    Tamam, Nissren
    Sulieman, Abdelmoneim
    CMC-COMPUTERS MATERIALS & CONTINUA, 2023, 74 (01): : 1709 - 1722
  • [45] SUPERVISED AND UNSUPERVISED ACTIVE LEARNING FOR AUTOMATIC SPEECH RECOGNITION OF LOW-RESOURCE LANGUAGES
    Syed, Ali Raza
    Rosenberg, Andrew
    Kislal, Ellen
    2016 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING PROCEEDINGS, 2016, : 5320 - 5324
  • [46] A model of speech recognition for hearing-impaired listeners based on deep learning
    Rossbach, Jana
    Kollmeier, Birger
    Meyer, Bernd T.
    JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA, 2022, 151 (03): : 1417 - 1427
  • [47] Human Activity Recognition via Hybrid Deep Learning Based Model
    Khan, Imran Ullah
    Afzal, Sitara
    Lee, Jong Weon
    SENSORS, 2022, 22 (01)
  • [48] IMPROVING THE PERFORMANCE OF TRANSFORMER BASED LOW RESOURCE SPEECH RECOGNITION FOR INDIAN LANGUAGES
    Shetty, Vishwas M.
    Mary, Metilda Sagaya N. J.
    Umesh, S.
    2020 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2020, : 8279 - 8283
  • [49] Cloud-based Automatic Speech Recognition Systems for Southeast Asian Languages
    Wang, Lei
    Tong, Rong
    Leung, Cheung-Chi
    Sivadas, Sunil
    Ni, Chongjia
    Ma, Bin
    PROCEEDINGS OF THE 2017 INTERNATIONAL CONFERENCE ON ORANGE TECHNOLOGIES (ICOT), 2017, : 147 - 150
  • [50] Development of HMM Based Automatic Speech Recognition System For Indian English
    Garud, Anushri
    Bang, Arti
    Joshi, Shrikant
    2018 FOURTH INTERNATIONAL CONFERENCE ON COMPUTING COMMUNICATION CONTROL AND AUTOMATION (ICCUBEA), 2018,