Towards Contextual Spelling Correction for Customization of End-to-End Speech Recognition Systems

被引:7
|
作者
Wang, Xiaoqiang [1 ]
Liu, Yanqing [1 ]
Li, Jinyu [2 ]
Miljanic, Veljko [2 ]
Zhao, Sheng [1 ]
Khalil, Hosam [2 ]
机构
[1] Microsoft, Beijing 100080, Peoples R China
[2] Microsoft, Redmond, WA 98052 USA
关键词
Context modeling; Decoding; Training; Iron; Indexes; Transformers; Task analysis; Speech recognition; contextual spelling correc- tion; contextual biasing; non-autoregressive;
D O I
10.1109/TASLP.2022.3205753
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
Contextual biasing is an important and challenging task for end-to-end automatic speech recognition (ASR) systems, which aims to achieve better recognition performance by biasing the ASR system to particular context phrases such as person names, music list, proper nouns, etc. Existing methods mainly include contextual LM biasing and adding bias encoder into end-to-end ASR models. In this work, we introduce a novel approach to do contextual biasing by adding a contextual spelling correction model on top of the end-to-end ASR system. We incorporate contextual information into a sequence-to-sequence spelling correction model with a shared context encoder. The proposed model includes two different mechanisms: autoregressive (AR) and non-autoregressive (NAR). We also propose filtering algorithms to handle large-size context lists, and performance balancing mechanisms to control the biasing degree of the model. The proposed model is a general biasing solution which is domain-insensitive and can be adopted in different scenarios. Experiments show that the proposed method achieves as much as 51% relative word error rate (WER) reduction over ASR system and outperforms traditional biasing methods. Compared to the AR solution, the NAR model reduces model size by 43.2% and speeds up inference by 2.1 times.
引用
收藏
页码:3089 / 3097
页数:9
相关论文
共 50 条
  • [21] END-TO-END MULTIMODAL SPEECH RECOGNITION
    Palaskar, Shruti
    Sanabria, Ramon
    Metze, Florian
    2018 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2018, : 5774 - 5778
  • [22] Overview of end-to-end speech recognition
    Wang, Song
    Li, Guanyu
    2018 INTERNATIONAL SYMPOSIUM ON POWER ELECTRONICS AND CONTROL ENGINEERING (ISPECE 2018), 2019, 1187
  • [23] End-to-end Accented Speech Recognition
    Viglino, Thibault
    Motlicek, Petr
    Cernak, Milos
    INTERSPEECH 2019, 2019, : 2140 - 2144
  • [24] Multichannel End-to-end Speech Recognition
    Ochiai, Tsubasa
    Watanabe, Shinji
    Hori, Takaaki
    Hershey, John R.
    INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 70, 2017, 70
  • [25] END-TO-END AUDIOVISUAL SPEECH RECOGNITION
    Petridis, Stavros
    Stafylakis, Themos
    Ma, Pingchuan
    Cai, Feipeng
    Tzimiropoulos, Georgios
    Pantic, Maja
    2018 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2018, : 6548 - 6552
  • [26] END-TO-END ANCHORED SPEECH RECOGNITION
    Wang, Yiming
    Fan, Xing
    Chen, I-Fan
    Liu, Yuzong
    Chen, Tongfei
    Hoffmeister, Bjorn
    2019 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2019, : 7090 - 7094
  • [27] Towards an End-to-End Speech Recognition Model for Accurate Quranic Recitation
    Al-Fadhli, Sumayya
    Al-Harbi, Hajar
    Cherif, Asma
    2023 20TH ACS/IEEE INTERNATIONAL CONFERENCE ON COMPUTER SYSTEMS AND APPLICATIONS, AICCSA, 2023,
  • [28] Towards End-to-End Speech Recognition with Deep Convolutional Neural Networks
    Zhang, Ying
    Pezeshki, Mohammad
    Brakel, Philemon
    Zhang, Saizheng
    Laurent, Cesar
    Bengio, Yoshua
    Courville, Aaron
    17TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2016), VOLS 1-5: UNDERSTANDING SPEECH PROCESSING IN HUMANS AND MACHINES, 2016, : 410 - 414
  • [29] Exploring end-to-end framework towards Khasi speech recognition system
    Bronson Syiem
    L. Joyprakash Singh
    International Journal of Speech Technology, 2021, 24 : 419 - 424
  • [30] TutorNet: Towards Flexible Knowledge Distillation for End-to-End Speech Recognition
    Yoon, Ji Won
    Lee, Hyeonseung
    Kim, Hyung Yong
    Cho, Won Ik
    Kim, Nam Soo
    IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2021, 29 (29) : 1626 - 1638