Fast offline transformer-based end-to-end automatic speech recognition for real-world applications

被引：4

作者：

Oh, Yoo Rhee ^{[1
]}

Park, Kiyoung ^{[1
]}

Park, Jeon Gue ^{[1
]}

机构：

[1] Elect & Telecommun Res Inst, Artificial Intelligence Res Lab, Daejeon, South Korea

来源：

ETRI JOURNAL | 2022年 / 44卷 / 03期

关键词：

connectionist temporal classification; end-to-end; speech recognition; transformer; CTC; ATTENTION; NETWORK; ASR;

D O I：

10.4218/etrij.2021-0106

中图分类号：

TM [电工技术]; TN [电子技术、通信技术];

学科分类号：

0808 ; 0809 ;

摘要：

With the recent advances in technology, automatic speech recognition (ASR) has been widely used in real-world applications. The efficiency of converting large amounts of speech into text accurately with limited resources has become more vital than ever. In this study, we propose a method to rapidly recognize a large speech database via a transformer-based end-to-end model. Transformers have improved the state-of-the-art performance in many fields. However, they are not easy to use for long sequences. In this study, various techniques to accelerate the recognition of real-world speeches are proposed and tested, including decoding via multiple-utterance-batched beam search, detecting end of speech based on a connectionist temporal classification (CTC), restricting the CTC-prefix score, and splitting long speeches into short segments. Experiments are conducted with the Librispeech dataset and the real-world Korean ASR tasks to verify the proposed methods. From the experiments, the proposed system can convert 8 h of speeches spoken at real-world meetings into text in less than 3 min with a 10.73% character error rate, which is 27.1% relatively lower than that of conventional systems.

引用

页码：476 / 490

页数：15

共 50 条

[1] A Transformer-Based End-to-End Automatic Speech Recognition Algorithm
Dong, Fang
Qian, Yiyang
Wang, Tianlei
Liu, Peng
Cao, Jiuwen
[J]. IEEE SIGNAL PROCESSING LETTERS, 2023, 30 : 1592 - 1596
[2] An End-to-End Transformer-Based Automatic Speech Recognition for Qur?an Reciters
Hadwan, Mohammed
Alsayadi, Hamzah A.
AL-Hagree, Salah
[J]. CMC-COMPUTERS MATERIALS & CONTINUA, 2023, 74 (02): : 3471 - 3487
[3] Transformer-based Long-context End-to-end Speech Recognition
Hori, Takaaki
Moritz, Niko
Hori, Chiori
Le Roux, Jonathan
[J]. INTERSPEECH 2020, 2020, : 5011 - 5015
[4] On-device Streaming Transformer-based End-to-End Speech Recognition
Oh, Yoo Rhee
Park, Kiyoung
[J]. INTERSPEECH 2021, 2021, : 967 - 968
[5] An Investigation of Positional Encoding in Transformer-based End-to-end Speech Recognition
Yue, Fengpeng
Ko, Tom
[J]. 2021 12TH INTERNATIONAL SYMPOSIUM ON CHINESE SPOKEN LANGUAGE PROCESSING (ISCSLP), 2021,
[6] Multi-Encoder Learning and Stream Fusion for Transformer-Based End-to-End Automatic Speech Recognition
Lohrenz, Timo
Li, Zhengyang
Fingscheidt, Tim
[J]. INTERSPEECH 2021, 2021, : 2846 - 2850
[7] TRANSFORMER-BASED END-TO-END SPEECH RECOGNITION WITH LOCAL DENSE SYNTHESIZER ATTENTION
Xu, Menglong
Li, Shengqiang
Zhang, Xiao-Lei
[J]. 2021 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP 2021), 2021, : 5899 - 5903
[8] SIMPLIFIED SELF-ATTENTION FOR TRANSFORMER-BASED END-TO-END SPEECH RECOGNITION
Luo, Haoneng
Zhang, Shiliang
Lei, Ming
Xie, Lei
[J]. 2021 IEEE SPOKEN LANGUAGE TECHNOLOGY WORKSHOP (SLT), 2021, : 75 - 81
[9] A study of transformer-based end-to-end speech recognition system for Kazakh language
Mamyrbayev Orken
Oralbekova Dina
Alimhan Keylan
Turdalykyzy Tolganay
Othman Mohamed
[J]. Scientific Reports, 12
[10] TRANSFORMER-BASED ONLINE CTC/ATTENTION END-TO-END SPEECH RECOGNITION ARCHITECTURE
Miao, Haoran
Cheng, Gaofeng
Gao, Changfeng
Zhang, Pengyuan
Yan, Yonghong
[J]. 2020 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2020, : 6084 - 6088

← 1 2 3 4 5 →