Hybrid deep learning based automatic speech recognition model for recognizing non-Indian languages

被引:0
|
作者
Gupta, Astha [1 ]
Kumar, Rakesh [1 ]
Kumar, Yogesh [2 ]
机构
[1] Chandigarh Univ, Dept Comp Sci & Engn, Mohali, Punjab, India
[2] Indus Univ, Indus Inst Technol & Engn, Ahmadabad, Gujarat, India
关键词
Automatic Speech Recognition; Spectrogram; Short Term Fourier transform; MFCC; ResNet10; Inception V3; VGG16; DenseNet201; EfficientNetB0;
D O I
10.1007/s11042-023-16748-1
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Speech is a natural phenomenon and a significant mode of communication used by humans that is divided into two categories, human-to-human and human-to-machine. Human-to-human communication depends on the language the speaker uses. In contrast, human-to-machine communication is a technique in which machines recognize human speech and act accordingly, often termed Automatic Speech Recognition (ASR). Recognition of Non-Indian language is challenging due to pitch variations and other factors such as accent, pronunciation, etc. This paper proposes a novel approach based on Dense Net201 and EfficientNetB0, i.e., a hybrid model for the recognition of Speech. Initially, 76,263 speech samples are taken from 11 non-Indian languages, including Chinese, Dutch, Finnish, French, German, Greek, Hungarian, Japanese, Russian, Spanish and Persian. When collected, these speech samples are pre-processed by removing noise. Then, Spectrogram, Short-Term Fourier Transform (STFT), Spectral Rolloff-Bandwidth, Mel-frequency Cepstral Coefficient (MFCC), and Chroma feature are used to extract features from the speech sample. Further, a comparative analysis of the proposed approach is shown with other Deep Learning (DL) models like ResNet10, Inception V3, VGG16, DenseNet201, and EfficientNetB0. Standard parameters like Precision, Recall, F1-Score, Confusion Matrix, Accuracy, and Loss curves are used to evaluate the performance of each model by considering speech samples from all the languages mentioned above. Thus, the experimental results show that the hybrid model stands out from all the other models by giving the highest recognition accuracy of 99.84% with a loss of 0.004%.
引用
收藏
页码:30145 / 30166
页数:22
相关论文
共 50 条
  • [31] A hybrid HMM/BN acoustic model for automatic speech recognition
    Markov, K
    Nakamura, S
    IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, 2003, E86D (03): : 438 - 445
  • [32] Robust phoneme classification for automatic speech recognition using hybrid features and an amalgamated learning model
    Khwaja, Mohammed Kamal
    Vikash, Peddakota
    Arulmozhivarman, P.
    Lui, Simon
    INTERNATIONAL JOURNAL OF SPEECH TECHNOLOGY, 2016, 19 (04) : 895 - 905
  • [33] Speech Vision: An End-to-End Deep Learning-Based Dysarthric Automatic Speech Recognition System
    Shahamiri, Seyed Reza
    IEEE TRANSACTIONS ON NEURAL SYSTEMS AND REHABILITATION ENGINEERING, 2021, 29 : 852 - 861
  • [34] Recognition of Handwritten Numerals of various Indian Regional Languages using Deep Learning
    Chaurasia, Saumya
    Agarwal, Suneeta
    2018 5TH IEEE UTTAR PRADESH SECTION INTERNATIONAL CONFERENCE ON ELECTRICAL, ELECTRONICS AND COMPUTER ENGINEERING (UPCON), 2018, : 582 - 587
  • [35] Deep Learning in Acoustic Modeling for Automatic Speech Recognition and Understanding - An Overview -
    Gavat, Inge
    Militaru, Diana
    2015 INTERNATIONAL CONFERENCE ON SPEECH TECHNOLOGY AND HUMAN-COMPUTER DIALOGUE (SPED), 2015,
  • [36] Automatic speech recognition using advanced deep learning approaches: A survey
    Kheddar, Hamza
    Hemis, Mustapha
    Himeur, Yassine
    INFORMATION FUSION, 2024, 109
  • [37] Deep transfer learning for automatic speech recognition: Towards better generalization
    Kheddar, Hamza
    Himeur, Yassine
    Al-Maadeed, Somaya
    Amira, Abbes
    Bensaali, Faycal
    KNOWLEDGE-BASED SYSTEMS, 2023, 277
  • [38] A deep learning approach for automatic speech recognition of The Holy Qur'an recitations
    Tantawi, Imad K.
    Abushariah, Mohammad A. M.
    Hammo, Bassam H.
    INTERNATIONAL JOURNAL OF SPEECH TECHNOLOGY, 2021, 24 (04) : 1017 - 1032
  • [39] A Highly Efficient Distributed Deep Learning System For Automatic Speech Recognition
    Zhang, Wei
    Cui, Xiaodong
    Finkler, Ulrich
    Saon, George
    Kayi, Abdullah
    Buyuktosunoglu, Alper
    Kingsbury, Brian
    Kung, David
    Picheny, Michael
    INTERSPEECH 2019, 2019, : 2628 - 2632
  • [40] End-to-End Automatic Speech Recognition with Deep Mutual Learning
    Masumura, Ryo
    Ihori, Mana
    Takashima, Akihiko
    Tanaka, Tomohiro
    Ashihara, Takanori
    2020 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA ASC), 2020, : 632 - 637