Investigation of Transformer based Spelling Correction Model for CTC-based End-to-End Mandarin Speech Recognition

被引:14
|
作者
Zhang, Shiliang [1 ]
Lei, Ming [1 ]
Yan, Zhijie [1 ]
机构
[1] Alibaba Grp, Machine Intelligence Technol, Beijing, Peoples R China
来源
关键词
speech recognition; spelling correction; CTC; End-to-End; Transformer; ERRORS;
D O I
10.21437/Interspeech.2019-1290
中图分类号
R36 [病理学]; R76 [耳鼻咽喉科学];
学科分类号
100104 ; 100213 ;
摘要
Connectionist Temporal Classification (CTC) based end-to-end speech recognition system usually need to incorporate an external language model by using WFST-based decoding in order to achieve promising results. This is more essential to Mandarin speech recognition since it owns a special phenomenon, namely homophone, which causes a lot of substitution errors. The linguistic information introduced by language model is somehow helpful to distinguish these substitution errors. In this work, we propose a transformer based spelling correction model to automatically correct errors, especially the substitution errors, made by CTC-based Mandarin speech recognition system. Specifically, we investigate to use the recognition results generated by CTC-based systems as input and the ground-truth transcriptions as output to train a transformer with encoder-decoder architecture, which is much similar to machine translation. Experimental results in a 20,000 hours Mandarin speech recognition task show that the proposed spelling correction model can achieve a CER of 3.41%, which results in 22.9% and 53.2% relative improvement compared to the baseline CTC-based systems decoded with and without language model, respectively.
引用
收藏
页码:2180 / 2184
页数:5
相关论文
共 50 条
  • [31] End-to-end Tibetan Ando dialect speech recognition based on hybrid CTC/attention architecture
    Sun, Jingwen
    Zhou, Gang
    Yang, Hongwu
    Wang, Man
    [J]. 2019 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA ASC), 2019, : 628 - 632
  • [32] Transformer Model Compression for End-to-End Speech Recognition on Mobile Devices
    Ben Letaifa, Leila
    Rouas, Jean-Luc
    [J]. 2022 30TH EUROPEAN SIGNAL PROCESSING CONFERENCE (EUSIPCO 2022), 2022, : 439 - 443
  • [33] Deep Speech 2: End-to-End Speech Recognition in English and Mandarin
    Amodei, Dario
    Ananthanarayanan, Sundaram
    Anubhai, Rishita
    Bai, Jingliang
    Battenberg, Eric
    Case, Carl
    Casper, Jared
    Catanzaro, Bryan
    Cheng, Qiang
    Chen, Guoliang
    Chen, Jie
    Chen, Jingdong
    Chen, Zhijie
    Chrzanowski, Mike
    Coates, Adam
    Diamos, Greg
    Ding, Ke
    Du, Niandong
    Elsen, Erich
    Engel, Jesse
    Fang, Weiwei
    Fan, Linxi
    Fougner, Christopher
    Gao, Liang
    Gong, Caixia
    Hannun, Awni
    Han, Tony
    Johannes, Lappi Vaino
    Jiang, Bing
    Ju, Cai
    Jun, Billy
    LeGresley, Patrick
    Lin, Libby
    Liu, Junjie
    Liu, Yang
    Li, Weigao
    Li, Xiangang
    Ma, Dongpeng
    Narang, Sharan
    Ng, Andrew
    Ozair, Sherjil
    Peng, Yiping
    Prenger, Ryan
    Qian, Sheng
    Quan, Zongfeng
    Raiman, Jonathan
    Rao, Vinay
    Satheesh, Sanjeev
    Seetapun, David
    Sengupta, Shubho
    [J]. INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 48, 2016, 48
  • [34] TRANSFORMER-BASED END-TO-END SPEECH RECOGNITION WITH LOCAL DENSE SYNTHESIZER ATTENTION
    Xu, Menglong
    Li, Shengqiang
    Zhang, Xiao-Lei
    [J]. 2021 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP 2021), 2021, : 5899 - 5903
  • [35] SIMPLIFIED SELF-ATTENTION FOR TRANSFORMER-BASED END-TO-END SPEECH RECOGNITION
    Luo, Haoneng
    Zhang, Shiliang
    Lei, Ming
    Xie, Lei
    [J]. 2021 IEEE SPOKEN LANGUAGE TECHNOLOGY WORKSHOP (SLT), 2021, : 75 - 81
  • [36] A study of transformer-based end-to-end speech recognition system for Kazakh language
    Mamyrbayev Orken
    Oralbekova Dina
    Alimhan Keylan
    Turdalykyzy Tolganay
    Othman Mohamed
    [J]. Scientific Reports, 12
  • [37] A study of transformer-based end-to-end speech recognition system for Kazakh language
    Mamyrbayev, Orken
    Oralbekova, Dina
    Alimhan, Keylan
    Turdalykyzy, Tolganay
    Othman, Mohamed
    [J]. SCIENTIFIC REPORTS, 2022, 12 (01)
  • [38] Semantic Data Augmentation for End-to-End Mandarin Speech Recognition
    Sun, Jianwei
    Tang, Zhiyuan
    Yin, Hengxin
    Wang, Wei
    Zhao, Xi
    Zhao, Shuaijiang
    Lei, Xiaoning
    Zou, Wei
    Li, Xiangang
    [J]. INTERSPEECH 2021, 2021, : 1269 - 1273
  • [39] Online Compressive Transformer for End-to-End Speech Recognition
    Leong, Chi-Hang
    Huang, Yu-Han
    Chien, Jen-Tzung
    [J]. INTERSPEECH 2021, 2021, : 2082 - 2086
  • [40] End-to-End Mandarin Speech Recognition Combining CNN and BLSTM
    Wang, Dong
    Wang, Xiaodong
    Lv, Shaohe
    [J]. SYMMETRY-BASEL, 2019, 11 (05):