A SPELLING CORRECTION MODEL FOR END-TO-END SPEECH RECOGNITION

被引:0
|
作者
Guo, Jinxi [1 ]
Sainath, Tara N. [2 ]
Weiss, Ron J. [2 ]
机构
[1] Univ Calif Los Angeles, Los Angeles, CA 90095 USA
[2] Google Inc, Mountain View, CA USA
关键词
speech recognition; sequence-to-sequence; attention models; spelling correction; language model;
D O I
10.1109/icassp.2019.8683745
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
Attention-based sequence-to-sequence models for speech recognition jointly train an acoustic model, language model (LM), and alignment mechanism using a single neural network and require only parallel audio-text pairs. Thus, the language model component of the end-to-end model is only trained on transcribed audio-text pairs, which leads to performance degradation especially on rare words. While there have been a variety of work that look at incorporating an external LM trained on text-only data into the end-to-end framework, none of them have taken into account the characteristic error distribution made by the model. In this paper, we propose a novel approach to utilizing text-only data, by training a spelling correction (SC) model to explicitly correct those errors. On the LibriSpeech dataset, we demonstrate that the proposed model results in an 18.6% relative improvement in WER over the baseline model when directly correcting top ASR hypothesis, and a 29.0% relative improvement when further rescoring an expanded n-best list using an external LM.
引用
收藏
页码:5651 / 5655
页数:5
相关论文
共 50 条
  • [1] Towards Contextual Spelling Correction for Customization of End-to-End Speech Recognition Systems
    Wang, Xiaoqiang
    Liu, Yanqing
    Li, Jinyu
    Miljanic, Veljko
    Zhao, Sheng
    Khalil, Hosam
    [J]. IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2022, 30 : 3089 - 3097
  • [2] Investigation of Transformer based Spelling Correction Model for CTC-based End-to-End Mandarin Speech Recognition
    Zhang, Shiliang
    Lei, Ming
    Yan, Zhijie
    [J]. INTERSPEECH 2019, 2019, : 2180 - 2184
  • [3] End-to-End Spelling Correction Conditioned on Acoustic Feature for Code-switching Speech Recognition
    Zhang, Shuai
    Yi, Jiangyan
    Tian, Zhengkun
    Bai, Ye
    Tao, Jianhua
    Liu, Xuefei
    Wen, Zhengqi
    [J]. INTERSPEECH 2021, 2021, : 266 - 270
  • [4] An End-to-End model for Vietnamese speech recognition
    Van Huy Nguyen
    [J]. 2019 IEEE - RIVF INTERNATIONAL CONFERENCE ON COMPUTING AND COMMUNICATION TECHNOLOGIES (RIVF), 2019, : 307 - 312
  • [5] Hybrid end-to-end model for Kazakh speech recognition
    Mamyrbayev O.Z.
    Oralbekova D.O.
    Alimhan K.
    Nuranbayeva B.M.
    [J]. International Journal of Speech Technology, 2023, 26 (2) : 261 - 270
  • [6] Residual Language Model for End-to-end Speech Recognition
    Tsunoo, Emiru
    Kashiwagi, Yosuke
    Narisetty, Chaitanya
    Watanabe, Shinji
    [J]. INTERSPEECH 2022, 2022, : 3899 - 3903
  • [7] MULTILINGUAL SPEECH RECOGNITION WITH A SINGLE END-TO-END MODEL
    Toshniwal, Shubham
    Sainath, Tara N.
    Weiss, Ron J.
    Li, Bo
    Moreno, Pedro
    Weinstein, Eugene
    Rao, Kanishka
    [J]. 2018 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2018, : 4904 - 4908
  • [8] END-TO-END MULTIMODAL SPEECH RECOGNITION
    Palaskar, Shruti
    Sanabria, Ramon
    Metze, Florian
    [J]. 2018 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2018, : 5774 - 5778
  • [9] End-to-End Speech Recognition in Russian
    Markovnikov, Nikita
    Kipyatkova, Irina
    Lyakso, Elena
    [J]. SPEECH AND COMPUTER (SPECOM 2018), 2018, 11096 : 377 - 386
  • [10] Overview of end-to-end speech recognition
    Wang, Song
    Li, Guanyu
    [J]. 2018 INTERNATIONAL SYMPOSIUM ON POWER ELECTRONICS AND CONTROL ENGINEERING (ISPECE 2018), 2019, 1187