A study of transformer-based end-to-end speech recognition system for Kazakh language

被引：14

作者：

Mamyrbayev, Orken ^{[1
]}

Oralbekova, Dina ^{[1
,2
]}

Alimhan, Keylan ^{[1
,3
]}

Turdalykyzy, Tolganay ^{[1
]}

Othman, Mohamed ^{[4
]}

机构：

[1] Inst Informat & Computat Technol CS MES RK, Alma Ata, Kazakhstan

[2] Satbayev Univ, Alma Ata, Kazakhstan

[3] LN Gumilyov Eurasian Natl Univ, Nur Sultan, Kazakhstan

[4] Univ Putra Malaysia, Kuala Lumpur, Malaysia

来源：

SCIENTIFIC REPORTS | 2022年 / 12卷 / 01期

关键词：

D O I：

10.1038/s41598-022-12260-y

中图分类号：

O [数理科学和化学]; P [天文学、地球科学]; Q [生物科学]; N [自然科学总论];

学科分类号：

07 ; 0710 ; 09 ;

摘要：

Today, the Transformer model, which allows parallelization and also has its own internal attention, has been widely used in the field of speech recognition. The great advantage of this architecture is the fast learning speed, and the lack of sequential operation, as with recurrent neural networks. In this work, Transformer models and an end-to-end model based on connectionist temporal classification were considered to build a system for automatic recognition of Kazakh speech. It is known that Kazakh is part of a number of agglutinative languages and has limited data for implementing speech recognition systems. Some studies have shown that the Transformer model improves system performance for low-resource languages. Based on our experiments, it was revealed that the joint use of Transformer and connectionist temporal classification models contributed to improving the performance of the Kazakh speech recognition system and with an integrated language model it showed the best character error rate 3.7% on a clean dataset.

引用

页数：11

共 50 条

[1] A study of transformer-based end-to-end speech recognition system for Kazakh language
Mamyrbayev Orken
Oralbekova Dina
Alimhan Keylan
Turdalykyzy Tolganay
Othman Mohamed
[J]. Scientific Reports, 12
[2] A Transformer-Based End-to-End Automatic Speech Recognition Algorithm
Dong, Fang
Qian, Yiyang
Wang, Tianlei
Liu, Peng
Cao, Jiuwen
[J]. IEEE SIGNAL PROCESSING LETTERS, 2023, 30 : 1592 - 1596
[3] An End-to-End Transformer-Based Automatic Speech Recognition for Qur?an Reciters
Hadwan, Mohammed
Alsayadi, Hamzah A.
AL-Hagree, Salah
[J]. CMC-COMPUTERS MATERIALS & CONTINUA, 2023, 74 (02): : 3471 - 3487
[4] On-device Streaming Transformer-based End-to-End Speech Recognition
Oh, Yoo Rhee
Park, Kiyoung
[J]. INTERSPEECH 2021, 2021, : 967 - 968
[5] Transformer-based Long-context End-to-end Speech Recognition
Hori, Takaaki
Moritz, Niko
Hori, Chiori
Le Roux, Jonathan
[J]. INTERSPEECH 2020, 2020, : 5011 - 5015
[6] An Investigation of Positional Encoding in Transformer-based End-to-end Speech Recognition
Yue, Fengpeng
Ko, Tom
[J]. 2021 12TH INTERNATIONAL SYMPOSIUM ON CHINESE SPOKEN LANGUAGE PROCESSING (ISCSLP), 2021,
[7] An Empirical Study on Transformer-Based End-to-End Speech Recognition with Novel Decoder Masking
Weng, Shi-Yan
Chiu, Hsuan-Sheng
Chen, Berlin
[J]. 2021 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA ASC), 2021, : 518 - 522
[8] TRANSFORMER-BASED END-TO-END SPEECH RECOGNITION WITH LOCAL DENSE SYNTHESIZER ATTENTION
Xu, Menglong
Li, Shengqiang
Zhang, Xiao-Lei
[J]. 2021 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP 2021), 2021, : 5899 - 5903
[9] SIMPLIFIED SELF-ATTENTION FOR TRANSFORMER-BASED END-TO-END SPEECH RECOGNITION
Luo, Haoneng
Zhang, Shiliang
Lei, Ming
Xie, Lei
[J]. 2021 IEEE SPOKEN LANGUAGE TECHNOLOGY WORKSHOP (SLT), 2021, : 75 - 81
[10] Improving Transformer-based End-to-End Speech Recognition with Connectionist Temporal Classification and Language Model Integration
Karita, Shigeki
Soplin, Nelson Enrique Yalta
Watanabe, Shinji
Delcroix, Marc
Ogawa, Atsunori
Nakatani, Tomohiro
[J]. INTERSPEECH 2019, 2019, : 1408 - 1412

← 1 2 3 4 5 →