Hybrid deep learning based automatic speech recognition model for recognizing non-Indian languages

被引:0
|
作者
Gupta, Astha [1 ]
Kumar, Rakesh [1 ]
Kumar, Yogesh [2 ]
机构
[1] Chandigarh Univ, Dept Comp Sci & Engn, Mohali, Punjab, India
[2] Indus Univ, Indus Inst Technol & Engn, Ahmadabad, Gujarat, India
关键词
Automatic Speech Recognition; Spectrogram; Short Term Fourier transform; MFCC; ResNet10; Inception V3; VGG16; DenseNet201; EfficientNetB0;
D O I
10.1007/s11042-023-16748-1
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Speech is a natural phenomenon and a significant mode of communication used by humans that is divided into two categories, human-to-human and human-to-machine. Human-to-human communication depends on the language the speaker uses. In contrast, human-to-machine communication is a technique in which machines recognize human speech and act accordingly, often termed Automatic Speech Recognition (ASR). Recognition of Non-Indian language is challenging due to pitch variations and other factors such as accent, pronunciation, etc. This paper proposes a novel approach based on Dense Net201 and EfficientNetB0, i.e., a hybrid model for the recognition of Speech. Initially, 76,263 speech samples are taken from 11 non-Indian languages, including Chinese, Dutch, Finnish, French, German, Greek, Hungarian, Japanese, Russian, Spanish and Persian. When collected, these speech samples are pre-processed by removing noise. Then, Spectrogram, Short-Term Fourier Transform (STFT), Spectral Rolloff-Bandwidth, Mel-frequency Cepstral Coefficient (MFCC), and Chroma feature are used to extract features from the speech sample. Further, a comparative analysis of the proposed approach is shown with other Deep Learning (DL) models like ResNet10, Inception V3, VGG16, DenseNet201, and EfficientNetB0. Standard parameters like Precision, Recall, F1-Score, Confusion Matrix, Accuracy, and Loss curves are used to evaluate the performance of each model by considering speech samples from all the languages mentioned above. Thus, the experimental results show that the hybrid model stands out from all the other models by giving the highest recognition accuracy of 99.84% with a loss of 0.004%.
引用
收藏
页码:30145 / 30166
页数:22
相关论文
共 50 条
  • [1] Hybrid deep learning based automatic speech recognition model for recognizing non-Indian languages
    Astha Gupta
    Rakesh Kumar
    Yogesh Kumar
    Multimedia Tools and Applications, 2024, 83 : 30145 - 30166
  • [2] ASRoIL: a comprehensive survey for automatic speech recognition of Indian languages
    Amitoj Singh
    Virender Kadyan
    Munish Kumar
    Nancy Bassan
    Artificial Intelligence Review, 2020, 53 : 3673 - 3704
  • [3] ASRoIL: a comprehensive survey for automatic speech recognition of Indian languages
    Singh, Amitoj
    Kadyan, Virender
    Kumar, Munish
    Bassan, Nancy
    ARTIFICIAL INTELLIGENCE REVIEW, 2020, 53 (05) : 3673 - 3704
  • [4] WTASR: Wavelet Transformer for Automatic Speech Recognition of Indian Languages
    Choudhary, Tripti
    Goyal, Vishal
    Bansal, Atul
    BIG DATA MINING AND ANALYTICS, 2023, 6 (01) : 85 - 91
  • [5] DEEP NEURAL NETWORKS BASED AUTOMATIC SPEECH RECOGNITION FOR FOUR ETHIOPIAN LANGUAGES
    Abate, Solomon Teferra
    Tachbelie, Martha Ylfiru
    Schultz, Tanja
    2020 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2020, : 8274 - 8278
  • [6] Improving Deep Learning based Automatic Speech Recognition for Gujarati
    Raval, Deepang
    Pathak, Vyom
    Patel, Muktan
    Bhatt, Brijesh
    ACM TRANSACTIONS ON ASIAN AND LOW-RESOURCE LANGUAGE INFORMATION PROCESSING, 2022, 21 (03)
  • [7] LAGNet: A Hybrid Deep Learning Model for Automatic Modulation Recognition
    Li, Zhuo
    Lu, Guangyue
    Li, Yuxin
    Zhou, Hao
    Li, Huan
    2024 IEEE WIRELESS COMMUNICATIONS AND NETWORKING CONFERENCE, WCNC 2024, 2024,
  • [8] Customized deep learning based Turkish automatic speech recognition system supported by language model
    Gormez, Yasin
    PEERJ COMPUTER SCIENCE, 2024, 10
  • [10] DISTRIBUTED DEEP LEARNING STRATEGIES FOR AUTOMATIC SPEECH RECOGNITION
    Zhang, Wei
    Cui, Xiaodong
    Finkler, Ulrich
    Kingsbury, Brian
    Saon, George
    Kung, David
    Picheny, Michael
    2019 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2019, : 5706 - 5710