Exploring end-to-end framework towards Khasi speech recognition system

被引:3
|
作者
Syiem, Bronson [1 ]
Singh, L. Joyprakash [1 ]
机构
[1] NEHU, Elect & Commun Engn, Shillong 793022, Meghalaya, India
关键词
Automatic speech recognition; Deep neural network; End-to-End; Hidden Markov model;
D O I
10.1007/s10772-021-09811-5
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
Building a conventional automatic speech recognition (ASR) system based on hidden Markov model (HMM)/deep neural network (DNN) makes the system complex as it requires various modules such as acoustic, lexicon, linguistic resources, language models etc. particularly with the low resource languages. In contrast, End-to-End architecture has greatly simplifies the model building process by representing complex modules with a simple deep network and by replacing the use of linguistic resources with a data-driven learning techniques. In this paper, we present our prior work by exploring End-to-End (E2E) framework for Khasi speech recognition system and the novel extension towards the development of speech corpora for standard Khasi dialect. We implemented the proposed E2E model by using Nabu ASR toolkit. Additionally, three other models (monophone, triphone and hybrid DNN) were built. Comparing the results, significant improvement was achieved using the proposed method particularly with the connectionist temporal classification (CTC) with a character error rate (CER) of 5.04%.
引用
收藏
页码:419 / 424
页数:6
相关论文
共 50 条
  • [1] Exploring end-to-end framework towards Khasi speech recognition system
    Bronson Syiem
    L. Joyprakash Singh
    International Journal of Speech Technology, 2021, 24 : 419 - 424
  • [2] TOWARDS END-TO-END UNSUPERVISED SPEECH RECOGNITION
    Liu, Alexander H.
    Hsu, Wei-Ning
    Auli, Michael
    Baevski, Alexei
    2022 IEEE SPOKEN LANGUAGE TECHNOLOGY WORKSHOP, SLT, 2022, : 221 - 228
  • [3] EXPLORING NEURAL TRANSDUCERS FOR END-TO-END SPEECH RECOGNITION
    Battenberg, Eric
    Chen, Jitong
    Child, Rewon
    Coates, Adam
    Gaur, Yashesh
    Li, Yi
    Liu, Hairong
    Satheesh, Sanjeev
    Sriram, Anuroop
    Zhu, Zhenyao
    2017 IEEE AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING WORKSHOP (ASRU), 2017, : 206 - 213
  • [4] Towards end-to-end speech recognition with transfer learning
    Chu-Xiong Qin
    Dan Qu
    Lian-Hai Zhang
    EURASIP Journal on Audio, Speech, and Music Processing, 2018
  • [5] Towards end-to-end speech recognition with transfer learning
    Qin, Chu-Xiong
    Qu, Dan
    Zhang, Lian-Hai
    EURASIP JOURNAL ON AUDIO SPEECH AND MUSIC PROCESSING, 2018,
  • [6] END-TO-END TRAINING OF A LARGE VOCABULARY END-TO-END SPEECH RECOGNITION SYSTEM
    Kim, Chanwoo
    Kim, Sungsoo
    Kim, Kwangyoun
    Kumar, Mehul
    Kim, Jiyeon
    Lee, Kyungmin
    Han, Changwoo
    Garg, Abhinav
    Kim, Eunhyang
    Shin, Minkyoo
    Singh, Shatrughan
    Heck, Larry
    Gowda, Dhananjaya
    2019 IEEE AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING WORKSHOP (ASRU 2019), 2019, : 562 - 569
  • [7] TOWARDS LANGUAGE-UNIVERSAL END-TO-END SPEECH RECOGNITION
    Kim, Suyoun
    Seltzer, Michael L.
    2018 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2018, : 4914 - 4918
  • [8] Towards End-to-End Speech Recognition with Recurrent Neural Networks
    Graves, Alex
    Jaitly, Navdeep
    INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 32 (CYCLE 2), 2014, 32 : 1764 - 1772
  • [9] Exploring End-to-End Techniques for Low-Resource Speech Recognition
    Bataev, Vladimir
    Korenevsky, Maxim
    Medennikov, Ivan
    Zatvornitskiy, Alexander
    SPEECH AND COMPUTER (SPECOM 2018), 2018, 11096 : 32 - 41
  • [10] EXPLORING MODEL UNITS AND TRAINING STRATEGIES FOR END-TO-END SPEECH RECOGNITION
    Huang, Mingkun
    Lu, Yizhou
    Wang, Lan
    Qian, Yanmin
    Yu, Kai
    2019 IEEE AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING WORKSHOP (ASRU 2019), 2019, : 524 - 531