RAttSR: A Novel Low-Cost Reconstructed Attention-Based End-to-End Speech Recognizer

被引:0
|
作者
Bachchu Paul
Santanu Phadikar
机构
[1] Vidyasagar University,Department of Computer Science
[2] Maulana Abul Kalam Azad University of Technology,Department of Computer Science and Engineering
[3] West Bengal,undefined
关键词
Automatic speech recognition; Mel-spectrogram; Convolution neural network; Long short term memory; Attention model;
D O I
暂无
中图分类号
学科分类号
摘要
People are curious about voice commands for the next generation of interaction. It will play a dominant role in communicating with smart devices in the future. However, language remains a significant barrier to the widespread use of these devices. Even the existing models for the traditional languages need to compute extensive parameters, resulting in higher computational costs. The most inconvenient in the latest advanced models is that they are unable to function on devices with constrained resources. This paper proposes a novel end-to-end speech recognition based on a low-cost Bidirectional Long Short Term Memory (BiLSTM) attention model. The mel-spectrogram of the speech signals has been generated to feed into the proposed neural attention model to classify isolated words. It consists of three convolution layers followed by two layers of BiLSTM that encode a vector of length 64 to get attention against the input sequence. The convolution layers characterize the relationship among the energy bins in the spectrogram. The BiLSTM network removes the prolonged reliance on the input sequence, and the attention block finds the most significant region in the input sequence, reducing the computational cost in the classification process. The encoded vector by the attention head is fed to three-layered fully connected networks for recognition. The model takes only 133K parameters, less than several current state-of-the-art models for isolated word recognition. Two datasets, the Speech Command Dataset (SCD), and a self-made dataset we developed for fifteen spoken colors in the Bengali dialect, are utilized in this study. Applying the proposed technique, the performance evaluation with validation and test accuracy in the Bengali color dataset reaches 98.82% and 98.95%, respectively, which outperforms the current state-of-the-art models regarding accuracy and model size. When the SCD has been trained using the same network model, the average test accuracy obtained is 96.95%. To underpin the proposed model, the outcome is compared with the recent state-of-the-art models, and the result shows the superiority of the proposed model.
引用
收藏
页码:2454 / 2476
页数:22
相关论文
共 50 条
  • [31] Assessing Knee OA Severity with CNN attention-based end-to-end architectures
    Gorriz, Marc
    Antony, Joseph
    McGuinness, Kevin
    Giro-i-Nieto, Xavier
    O'Connor, Noel E.
    INTERNATIONAL CONFERENCE ON MEDICAL IMAGING WITH DEEP LEARNING, VOL 102, 2019, 102 : 197 - 214
  • [32] Gaussian Prediction based Attention for Online End-to-End Speech Recognition
    Hou, Junfeng
    Zhang, Shiliang
    Dai, Lirong
    18TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2017), VOLS 1-6: SITUATED INTERACTION, 2017, : 3692 - 3696
  • [33] Adversarial Regularization for Attention Based End-to-End Robust Speech Recognition
    Sun, Sining
    Guo, Pengcheng
    Xie, Lei
    Hwang, Mei-Yuh
    IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2019, 27 (11) : 1826 - 1838
  • [34] Large Margin Training for Attention Based End-to-End Speech Recognition
    Wang, Peidong
    Cui, Jia
    Weng, Chao
    Yu, Dong
    INTERSPEECH 2019, 2019, : 246 - 250
  • [35] Portable end-to-end ground system for low-cost mission support
    Lam, B
    ACTA ASTRONAUTICA, 1996, 39 (9-12) : 909 - 915
  • [36] Analyzing the Quality and Stability of a Streaming End-to-End On-Device Speech Recognizer
    Shangguan, Yuan
    Knister, Kate
    He, Yanzhang
    McGraw, Ian
    Beaufays, Francoise
    INTERSPEECH 2020, 2020, : 591 - 595
  • [37] Context-dependent Label Smoothing Regularization for Attention-based End-to-End Code-Switching Speech Recognition
    Huang, Zheying
    Li, Peng
    Xu, Ji
    Zhang, Pengyuan
    Yan, Yonghong
    2021 12TH INTERNATIONAL SYMPOSIUM ON CHINESE SPOKEN LANGUAGE PROCESSING (ISCSLP), 2021,
  • [38] A Novel End-to-End Image Caption Based on Multimodal Attention
    Li X.-M.
    Yue G.
    Chen G.-W.
    Dianzi Keji Daxue Xuebao/Journal of the University of Electronic Science and Technology of China, 2020, 49 (06): : 867 - 874
  • [39] SpecTextor: End-to-End Attention-based Mechanism for Dense Text Generation in Sports Journalism
    Ghosh, Indrajeet
    Ivler, Matthew
    Ramamurthy, Sreenivasan Ramasamy
    Roy, Nirmalya
    2022 IEEE INTERNATIONAL CONFERENCE ON SMART COMPUTING (SMARTCOMP 2022), 2022, : 362 - 367
  • [40] AESGRU: An Attention-Based Temporal Correlation Approach for End-to-End Machine Health Perception
    Zhang, Weiting
    Yang, Dong
    Wang, Hongchao
    Zhang, Jun
    Gidlund, Mikael
    IEEE ACCESS, 2019, 7 : 141487 - 141497