Fast offline transformer-based end-to-end automatic speech recognition for real-world applications

被引:4
|
作者
Oh, Yoo Rhee [1 ]
Park, Kiyoung [1 ]
Park, Jeon Gue [1 ]
机构
[1] Elect & Telecommun Res Inst, Artificial Intelligence Res Lab, Daejeon, South Korea
关键词
connectionist temporal classification; end-to-end; speech recognition; transformer; CTC; ATTENTION; NETWORK; ASR;
D O I
10.4218/etrij.2021-0106
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
With the recent advances in technology, automatic speech recognition (ASR) has been widely used in real-world applications. The efficiency of converting large amounts of speech into text accurately with limited resources has become more vital than ever. In this study, we propose a method to rapidly recognize a large speech database via a transformer-based end-to-end model. Transformers have improved the state-of-the-art performance in many fields. However, they are not easy to use for long sequences. In this study, various techniques to accelerate the recognition of real-world speeches are proposed and tested, including decoding via multiple-utterance-batched beam search, detecting end of speech based on a connectionist temporal classification (CTC), restricting the CTC-prefix score, and splitting long speeches into short segments. Experiments are conducted with the Librispeech dataset and the real-world Korean ASR tasks to verify the proposed methods. From the experiments, the proposed system can convert 8 h of speeches spoken at real-world meetings into text in less than 3 min with a 10.73% character error rate, which is 27.1% relatively lower than that of conventional systems.
引用
收藏
页码:476 / 490
页数:15
相关论文
共 50 条
  • [41] Continual Learning for Monolingual End-to-End Automatic Speech Recognition
    Vander Eeckt, Steven
    Van Hamme, Hugo
    [J]. 2022 30TH EUROPEAN SIGNAL PROCESSING CONFERENCE (EUSIPCO 2022), 2022, : 459 - 463
  • [42] STRUCTURED SPARSE ATTENTION FOR END-TO-END AUTOMATIC SPEECH RECOGNITION
    Xue, Jiabin
    Zheng, Tieran
    Han, Jiqing
    [J]. 2020 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2020, : 7044 - 7048
  • [43] Transformer Model Compression for End-to-End Speech Recognition on Mobile Devices
    Ben Letaifa, Leila
    Rouas, Jean-Luc
    [J]. 2022 30TH EUROPEAN SIGNAL PROCESSING CONFERENCE (EUSIPCO 2022), 2022, : 439 - 443
  • [44] Simple Data Augmented Transformer End-To-End Tibetan Speech Recognition
    Yang, Xiaodong
    Wang, Weizhe
    Yang, Hongwu
    Jiang, Jiaolong
    [J]. 2020 IEEE 3RD INTERNATIONAL CONFERENCE ON INFORMATION COMMUNICATION AND SIGNAL PROCESSING (ICICSP 2020), 2020, : 148 - 152
  • [45] ESPRESSO: A FAST END-TO-END NEURAL SPEECH RECOGNITION TOOLKIT
    Wang, Yiming
    Chen, Tongfei
    Xu, Hainan
    Ding, Shuoyang
    Lv, Hang
    Shao, Yiwen
    Peng, Nanyun
    Xie, Lei
    Watanabe, Shinji
    Khudanpur, Sanjeev
    [J]. 2019 IEEE AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING WORKSHOP (ASRU 2019), 2019, : 136 - 143
  • [46] SymFormer: End-to-End Symbolic Regression Using Transformer-Based Architecture
    Vastl, Martin
    Kulhanek, Jonas
    Kubalik, Jiri
    Derner, Erik
    Babuska, Robert
    [J]. IEEE ACCESS, 2024, 12 : 37840 - 37849
  • [47] Speech-and-Text Transformer: Exploiting Unpaired Text for End-to-End Speech Recognition
    Wang, Qinyi
    Zhou, Xinyuan
    Li, Haizhou
    [J]. APSIPA TRANSACTIONS ON SIGNAL AND INFORMATION PROCESSING, 2023, 12 (01)
  • [48] End-to-end information fusion method for transformer-based stereo matching
    Xu, Zhenghui
    Wang, Jingxue
    Guo, Jun
    [J]. MEASUREMENT SCIENCE AND TECHNOLOGY, 2024, 35 (06)
  • [49] UNIFIED END-TO-END SPEECH RECOGNITION AND ENDPOINTING FOR FAST AND EFFICIENT SPEECH SYSTEMS
    Bijwadia, Shaan
    Chang, Shuo-yiin
    Li, Bo
    Sainath, Tara
    Zhang, Chao
    He, Yanzhang
    [J]. 2022 IEEE SPOKEN LANGUAGE TECHNOLOGY WORKSHOP, SLT, 2022, : 310 - 316
  • [50] TOWARDS A ROMANIAN END-TO-END AUTOMATIC SPEECH RECOGNITION BASED ON DEEPSPEECH2
    Avram, Andrei-Marius
    Pais, Vasile
    Tufis, Dan
    [J]. PROCEEDINGS OF THE ROMANIAN ACADEMY SERIES A-MATHEMATICS PHYSICS TECHNICAL SCIENCES INFORMATION SCIENCE, 2020, 21 (04): : 395 - 402