Sub-word Based End-to-End Speech Recognition for an Under-Resourced Language: Amharic

被引:0
|
作者
Gebreegziabher, Nirayo Hailu [1 ]
Nuernberger, Andreas [1 ]
机构
[1] Otto von Guericke Univ, Fac Comp Sci, Data & Knowledge Engn Grp, Magdeburg, Germany
关键词
Speech Recognition; Phoneme; Grapheme; End-to-End Mode; Syllable Units; NEURAL-NETWORKS;
D O I
10.1109/smc42975.2020.9283401
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
In this work, we focused on end-to-end speech recognition for less-resourced language, Amharic. The result can be integrated with other tasks such as spoken content retrieval. We explored three models, which consist of Convolutional Neural Networks, Recurrent Neural Networks, and Connectionist Temporal Classification, towards end-to-end speech recognition on less-resourced language. Further, we studied the possibility of having an end-to-end system with 1-best output keeping the network parameters and computational resource minimal. The paper gives attention to finding a more suitable sub-lexical unit for the Amharic end-to-end speech recognition system which can be used as an audio indexing unit. We present the first result comparing grapheme, phoneme, and syllable-based end-to-end speech recognition systems for our target language. The models are evaluated on approximately 52 hours of Amharic speech corpus containing read-speech, audiobooks, and multi-genre radio programs. On the test set, we report a character error rate (CER) of 19.21% and a syllable error rate (SER) of 39.98% for a syllable-based end-to-end model without lexicons and language model integrated.
引用
收藏
页码:3466 / 3470
页数:5
相关论文
共 50 条
  • [41] Deep learning based large vocabulary continuous speech recognition of an under-resourced language Bangladeshi Bangla
    Samin, Ahnaf Mozib
    Kobir, M. Humayon
    Kibria, Shafkat
    Rahman, M. Shahidur
    [J]. ACOUSTICAL SCIENCE AND TECHNOLOGY, 2021, 42 (05) : 252 - 260
  • [42] Using different acoustic, lexical and language modeling units for ASR of an under-resourced language - Amharic
    Tachbelie, Martha Yifiru
    Abate, Solomon Teferra
    Besacier, Laurent
    [J]. SPEECH COMMUNICATION, 2014, 56 : 181 - 194
  • [43] IMPROVING UNSUPERVISED STYLE TRANSFER IN END-TO-END SPEECH SYNTHESIS WITH END-TO-END SPEECH RECOGNITION
    Liu, Da-Rong
    Yang, Chi-Yu
    Wu, Szu-Lin
    Lee, Hung-Yi
    [J]. 2018 IEEE WORKSHOP ON SPOKEN LANGUAGE TECHNOLOGY (SLT 2018), 2018, : 640 - 647
  • [44] Lattice Based Transcription Loss for End-to-End Speech Recognition
    Kang, Jian
    Zhang, Wei-Qiang
    Liu, Jia
    [J]. 2016 10TH INTERNATIONAL SYMPOSIUM ON CHINESE SPOKEN LANGUAGE PROCESSING (ISCSLP), 2016,
  • [45] End-to-End Speech Emotion Recognition Based on Neural Network
    Zhu, Bing
    Zhou, Wenkai
    Wang, Yutian
    Wang, Hui
    Cai, Juan Juan
    [J]. 2017 17TH IEEE INTERNATIONAL CONFERENCE ON COMMUNICATION TECHNOLOGY (ICCT 2017), 2017, : 1634 - 1638
  • [46] Tunisian Dialectal End-to-end Speech Recognition based on DeepSpeech
    Messaoudi, Abir
    Haddad, Hatem
    Fourati, Chayma
    Hmida, Moez BenHaj
    Mabrouk, Aymen Ben Elhaj
    Graiet, Mohamed
    [J]. AI IN COMPUTATIONAL LINGUISTICS, 2021, 189 : 183 - 190
  • [47] Semantic Mask for Transformer based End-to-End Speech Recognition
    Wang, Chengyi
    Wu, Yu
    Du, Yujiao
    Li, Jinyu
    Liu, Shujie
    Lu, Liang
    Ren, Shuo
    Ye, Guoli
    Zhao, Sheng
    Zhou, Ming
    [J]. INTERSPEECH 2020, 2020, : 971 - 975
  • [48] An End-to-end Speech Recognition Algorithm based on Attention Mechanism
    Chen, Jia-nan
    Gao, Shuang
    Sun, Han-zhe
    Liu, Xiao-hui
    Wang, Zi-ning
    Zheng, Yan
    [J]. PROCEEDINGS OF THE 39TH CHINESE CONTROL CONFERENCE, 2020, : 2935 - 2940
  • [49] Lattice Based Transcription Loss for End-to-End Speech Recognition
    Kang, Jian
    Zhang, Wei-Qiang
    Liu, Wei-Wei
    Liu, Jia
    Johnson, Michael T.
    [J]. JOURNAL OF SIGNAL PROCESSING SYSTEMS FOR SIGNAL IMAGE AND VIDEO TECHNOLOGY, 2018, 90 (07): : 1013 - 1023
  • [50] Lattice Based Transcription Loss for End-to-End Speech Recognition
    Jian Kang
    Wei-Qiang Zhang
    Wei-Wei Liu
    Jia Liu
    Michael T. Johnson
    [J]. Journal of Signal Processing Systems, 2018, 90 : 1013 - 1023