Sub-word Based End-to-End Speech Recognition for an Under-Resourced Language: Amharic

被引:0
|
作者
Gebreegziabher, Nirayo Hailu [1 ]
Nuernberger, Andreas [1 ]
机构
[1] Otto von Guericke Univ, Fac Comp Sci, Data & Knowledge Engn Grp, Magdeburg, Germany
关键词
Speech Recognition; Phoneme; Grapheme; End-to-End Mode; Syllable Units; NEURAL-NETWORKS;
D O I
10.1109/smc42975.2020.9283401
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
In this work, we focused on end-to-end speech recognition for less-resourced language, Amharic. The result can be integrated with other tasks such as spoken content retrieval. We explored three models, which consist of Convolutional Neural Networks, Recurrent Neural Networks, and Connectionist Temporal Classification, towards end-to-end speech recognition on less-resourced language. Further, we studied the possibility of having an end-to-end system with 1-best output keeping the network parameters and computational resource minimal. The paper gives attention to finding a more suitable sub-lexical unit for the Amharic end-to-end speech recognition system which can be used as an audio indexing unit. We present the first result comparing grapheme, phoneme, and syllable-based end-to-end speech recognition systems for our target language. The models are evaluated on approximately 52 hours of Amharic speech corpus containing read-speech, audiobooks, and multi-genre radio programs. On the test set, we report a character error rate (CER) of 19.21% and a syllable error rate (SER) of 39.98% for a syllable-based end-to-end model without lexicons and language model integrated.
引用
收藏
页码:3466 / 3470
页数:5
相关论文
共 50 条
  • [1] Automatic Speech Recognition for an Under-Resourced Language - Amharic
    Abate, Solomon Teferra
    Menzel, Wolfgang
    [J]. INTERSPEECH 2007: 8TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION, VOLS 1-4, 2007, : 1737 - 1740
  • [2] Automatic Speech Recognition for an Under-Resourced Language - Amharic
    Abate, Solomon Teferra
    Menzel, Wolfgang
    [J]. INTERSPEECH 2007: 8TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION, VOLS 1-4, 2007, : 973 - 976
  • [3] IMPROVING END-TO-END SPEECH RECOGNITION WITH PRONUNCIATION-ASSISTED SUB-WORD MODELING
    Xu, Hainan
    Ding, Shuoyang
    Watanabe, Shinji
    [J]. 2019 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2019, : 7110 - 7114
  • [4] END-TO-END SPEECH RECOGNITION WITH WORD-BASED RNN LANGUAGE MODELS
    Hori, Takaaki
    Cho, Jaejin
    Watanabe, Shinji
    [J]. 2018 IEEE WORKSHOP ON SPOKEN LANGUAGE TECHNOLOGY (SLT 2018), 2018, : 389 - 396
  • [5] Incorporating language constraints in sub-word based speech recognition
    Erdogan, H
    Büyük, O
    Oflazer, K
    [J]. 2005 IEEE WORKSHOP ON AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING (ASRU), 2005, : 98 - +
  • [6] End-to-End Speech Recognition of Tamil Language
    Changrampadi, Mohamed Hashim
    Shahina, A.
    Narayanan, M. Badri
    Khan, A. Nayeemulla
    [J]. INTELLIGENT AUTOMATION AND SOFT COMPUTING, 2022, 32 (02): : 1309 - 1323
  • [7] MORPHOLOGY-BASED AND SUB-WORD LANGUAGE MODELING FOR TURKISH SPEECH RECOGNITION
    Sak, Hasim
    Saraclar, Murat
    Gungor, Tunga
    [J]. 2010 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2010, : 5402 - 5405
  • [8] Residual Language Model for End-to-end Speech Recognition
    Tsunoo, Emiru
    Kashiwagi, Yosuke
    Narisetty, Chaitanya
    Watanabe, Shinji
    [J]. INTERSPEECH 2022, 2022, : 3899 - 3903
  • [9] Automatic Speech Recognition for Under-Resourced Languages: Application to Vietnamese Language
    Le, Viet-Bac
    Besacier, Laurent
    [J]. IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2009, 17 (08): : 1471 - 1482
  • [10] Minimum Word Error Rate Training with Language Model Fusion for End-to-End Speech Recognition
    Meng, Zhong
    Wu, Yu
    Kanda, Naoyuki
    Lu, Liang
    Chen, Xie
    Ye, Guoli
    Sun, Eric
    Li, Jinyu
    Gong, Yifan
    [J]. INTERSPEECH 2021, 2021, : 2596 - 2600