A study of transformer-based end-to-end speech recognition system for Kazakh language

被引:14
|
作者
Mamyrbayev, Orken [1 ]
Oralbekova, Dina [1 ,2 ]
Alimhan, Keylan [1 ,3 ]
Turdalykyzy, Tolganay [1 ]
Othman, Mohamed [4 ]
机构
[1] Inst Informat & Computat Technol CS MES RK, Alma Ata, Kazakhstan
[2] Satbayev Univ, Alma Ata, Kazakhstan
[3] LN Gumilyov Eurasian Natl Univ, Nur Sultan, Kazakhstan
[4] Univ Putra Malaysia, Kuala Lumpur, Malaysia
关键词
D O I
10.1038/s41598-022-12260-y
中图分类号
O [数理科学和化学]; P [天文学、地球科学]; Q [生物科学]; N [自然科学总论];
学科分类号
07 ; 0710 ; 09 ;
摘要
Today, the Transformer model, which allows parallelization and also has its own internal attention, has been widely used in the field of speech recognition. The great advantage of this architecture is the fast learning speed, and the lack of sequential operation, as with recurrent neural networks. In this work, Transformer models and an end-to-end model based on connectionist temporal classification were considered to build a system for automatic recognition of Kazakh speech. It is known that Kazakh is part of a number of agglutinative languages and has limited data for implementing speech recognition systems. Some studies have shown that the Transformer model improves system performance for low-resource languages. Based on our experiments, it was revealed that the joint use of Transformer and connectionist temporal classification models contributed to improving the performance of the Kazakh speech recognition system and with an integrated language model it showed the best character error rate 3.7% on a clean dataset.
引用
收藏
页数:11
相关论文
共 50 条
  • [1] A study of transformer-based end-to-end speech recognition system for Kazakh language
    Mamyrbayev Orken
    Oralbekova Dina
    Alimhan Keylan
    Turdalykyzy Tolganay
    Othman Mohamed
    [J]. Scientific Reports, 12
  • [2] A Transformer-Based End-to-End Automatic Speech Recognition Algorithm
    Dong, Fang
    Qian, Yiyang
    Wang, Tianlei
    Liu, Peng
    Cao, Jiuwen
    [J]. IEEE SIGNAL PROCESSING LETTERS, 2023, 30 : 1592 - 1596
  • [3] An End-to-End Transformer-Based Automatic Speech Recognition for Qur?an Reciters
    Hadwan, Mohammed
    Alsayadi, Hamzah A.
    AL-Hagree, Salah
    [J]. CMC-COMPUTERS MATERIALS & CONTINUA, 2023, 74 (02): : 3471 - 3487
  • [4] On-device Streaming Transformer-based End-to-End Speech Recognition
    Oh, Yoo Rhee
    Park, Kiyoung
    [J]. INTERSPEECH 2021, 2021, : 967 - 968
  • [5] Transformer-based Long-context End-to-end Speech Recognition
    Hori, Takaaki
    Moritz, Niko
    Hori, Chiori
    Le Roux, Jonathan
    [J]. INTERSPEECH 2020, 2020, : 5011 - 5015
  • [6] An Investigation of Positional Encoding in Transformer-based End-to-end Speech Recognition
    Yue, Fengpeng
    Ko, Tom
    [J]. 2021 12TH INTERNATIONAL SYMPOSIUM ON CHINESE SPOKEN LANGUAGE PROCESSING (ISCSLP), 2021,
  • [7] An Empirical Study on Transformer-Based End-to-End Speech Recognition with Novel Decoder Masking
    Weng, Shi-Yan
    Chiu, Hsuan-Sheng
    Chen, Berlin
    [J]. 2021 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA ASC), 2021, : 518 - 522
  • [8] TRANSFORMER-BASED END-TO-END SPEECH RECOGNITION WITH LOCAL DENSE SYNTHESIZER ATTENTION
    Xu, Menglong
    Li, Shengqiang
    Zhang, Xiao-Lei
    [J]. 2021 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP 2021), 2021, : 5899 - 5903
  • [9] SIMPLIFIED SELF-ATTENTION FOR TRANSFORMER-BASED END-TO-END SPEECH RECOGNITION
    Luo, Haoneng
    Zhang, Shiliang
    Lei, Ming
    Xie, Lei
    [J]. 2021 IEEE SPOKEN LANGUAGE TECHNOLOGY WORKSHOP (SLT), 2021, : 75 - 81
  • [10] Improving Transformer-based End-to-End Speech Recognition with Connectionist Temporal Classification and Language Model Integration
    Karita, Shigeki
    Soplin, Nelson Enrique Yalta
    Watanabe, Shinji
    Delcroix, Marc
    Ogawa, Atsunori
    Nakatani, Tomohiro
    [J]. INTERSPEECH 2019, 2019, : 1408 - 1412