End-to-end ASR framework for Indian-English accent: using speech CNN-based segmentation

被引:0
|
作者
Ahmed G. [1 ]
Lawaye A.A. [1 ]
机构
[1] Baba Ghulam Shah Badshah University, J&K, Rajouri
关键词
Automatic speech recognition; CNN–BiLSTM; Endpoint detection; Speech segmentation; Wav2vec;
D O I
10.1007/s10772-023-10053-w
中图分类号
学科分类号
摘要
The superiority of Automatic Speech Recognition (ASR) has significantly enhanced over time, with a focus from short utterance circumstances to longer audio signal. In short utterances, speech endpoints are distinct, ensuring a good user experience. However, in long-form scenarios, these endpoints are less clear, leading to unnecessary resource consumption and deviating from ASR's primary goal of generating highly readable and well-formatted transcriptions. In this study, we introduced an ASR framework tailored for the Indian English accent. We employed Speech Segments Endpoint Detection (SSED), built using Mel-spectrogram features, short time energy signal, and a hybrid Convolutional Neural Networks (CNN) and Bidirectional Long Short-Term Memory (BiLSTM) model. Our experiments on a 29-h audio dataset containing Indian English accent speech achieved impressive results: the CNN–BiLSTM classification model for speech endpoint detection attained 98.67% accuracy in training and 93.62% accuracy in validation. The resulting ASR system achieved a Word Error Rate (WER) of 11.63%. Notably, the segmentation model reduced the dataset length by 16.4%, making it a valuable contribution to ASR technology. © 2023, The Author(s), under exclusive licence to Springer Science+Business Media, LLC, part of Springer Nature.
引用
收藏
页码:903 / 918
页数:15
相关论文
共 50 条
  • [1] CNN-Based End-To-End Language Identification
    Wang, Yutian
    Zhou, Huan
    Wang, Zheng
    Wang, Jingling
    Wang, Hui
    [J]. PROCEEDINGS OF 2019 IEEE 3RD INFORMATION TECHNOLOGY, NETWORKING, ELECTRONIC AND AUTOMATION CONTROL CONFERENCE (ITNEC 2019), 2019, : 2475 - 2479
  • [2] CNN-based Multichannel End-to-End Speech Recognition for Everyday Home Environments
    Yalta, Nelson
    Watanabe, Shinji
    Hori, Takaaki
    Nakadai, Kazuhiro
    Ogata, Tetsuya
    [J]. 2019 27TH EUROPEAN SIGNAL PROCESSING CONFERENCE (EUSIPCO), 2019,
  • [3] AN END-TO-END SPEECH ACCENT RECOGNITION METHOD BASED ON HYBRID CTC/ATTENTION TRANSFORMER ASR
    Gao, Qiang
    Wu, Haiwei
    Sun, Yanqing
    Duan, Yitao
    [J]. 2021 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP 2021), 2021, : 7253 - 7257
  • [4] A CNN-Based End-to-End Learning Framework Toward Intelligent Communication Systems
    Wu, Nan
    Wang, Xudong
    Lin, Bin
    Zhang, Kaiyao
    [J]. IEEE ACCESS, 2019, 7 : 110197 - 110204
  • [5] CNN-based End-to-End Learning for Lane Centering
    Ebu, Iffat Ara
    Islam, Fahmida
    Ball, John E.
    Goodin, Christopher T.
    [J]. AUTONOMOUS SYSTEMS:SENSORS, PROCESSING, AND SECURITY FOR GROUND, AIR, SEA, AND SPACE VEHICLES AND INFRASTRUCTURE 2024, 2024, 13052
  • [6] A Light-Weight Autoregressive CNN-Based Frame Level Transducer Decoder for End-to-End ASR
    Noh, Hyeon-Kyu
    Park, Hong-June
    [J]. APPLIED SCIENCES-BASEL, 2024, 14 (03):
  • [7] CNN-based End-to-end Autonomous Driving on FPGA Using TVM and VTA
    Uetsuki Toshihiro
    Okuyama Yuichi
    Shin Jungpil
    [J]. 2021 IEEE 14TH INTERNATIONAL SYMPOSIUM ON EMBEDDED MULTICORE/MANY-CORE SYSTEMS-ON-CHIP (MCSOC 2021), 2021, : 140 - 144
  • [8] Impact of Aliasing on Deep CNN-Based End-to-End Acoustic Models
    Gong, Yuan
    Poellabauer, Christian
    [J]. 19TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2018), VOLS 1-6: SPEECH RESEARCH FOR EMERGING MARKETS IN MULTILINGUAL SOCIETIES, 2018, : 2698 - 2702
  • [9] Deep Covariance Feature and CNN-based End-to-End Masked Face Recognition
    Junayed, Masum Shah
    Sadeghzadeh, Arezoo
    Islam, Md Baharul
    [J]. 2021 16TH IEEE INTERNATIONAL CONFERENCE ON AUTOMATIC FACE AND GESTURE RECOGNITION (FG 2021), 2021,
  • [10] End-to-end Off-angle Iris Recognition Using CNN Based Iris Segmentation
    Jalilian, Ehsaneddin
    Karakaya, Mahmut
    Uhl, Andreas
    [J]. 2020 INTERNATIONAL CONFERENCE OF THE BIOMETRICS SPECIAL INTEREST GROUP (BIOSIG), 2020, P-306