Towards Contextual Spelling Correction for Customization of End-to-End Speech Recognition Systems

被引:3
|
作者
Wang, Xiaoqiang [1 ]
Liu, Yanqing [1 ]
Li, Jinyu [2 ]
Miljanic, Veljko [2 ]
Zhao, Sheng [1 ]
Khalil, Hosam [2 ]
机构
[1] Microsoft, Beijing 100080, Peoples R China
[2] Microsoft, Redmond, WA 98052 USA
关键词
Context modeling; Decoding; Training; Iron; Indexes; Transformers; Task analysis; Speech recognition; contextual spelling correc- tion; contextual biasing; non-autoregressive;
D O I
10.1109/TASLP.2022.3205753
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
Contextual biasing is an important and challenging task for end-to-end automatic speech recognition (ASR) systems, which aims to achieve better recognition performance by biasing the ASR system to particular context phrases such as person names, music list, proper nouns, etc. Existing methods mainly include contextual LM biasing and adding bias encoder into end-to-end ASR models. In this work, we introduce a novel approach to do contextual biasing by adding a contextual spelling correction model on top of the end-to-end ASR system. We incorporate contextual information into a sequence-to-sequence spelling correction model with a shared context encoder. The proposed model includes two different mechanisms: autoregressive (AR) and non-autoregressive (NAR). We also propose filtering algorithms to handle large-size context lists, and performance balancing mechanisms to control the biasing degree of the model. The proposed model is a general biasing solution which is domain-insensitive and can be adopted in different scenarios. Experiments show that the proposed method achieves as much as 51% relative word error rate (WER) reduction over ASR system and outperforms traditional biasing methods. Compared to the AR solution, the NAR model reduces model size by 43.2% and speeds up inference by 2.1 times.
引用
收藏
页码:3089 / 3097
页数:9
相关论文
共 50 条
  • [1] A SPELLING CORRECTION MODEL FOR END-TO-END SPEECH RECOGNITION
    Guo, Jinxi
    Sainath, Tara N.
    Weiss, Ron J.
    [J]. 2019 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2019, : 5651 - 5655
  • [2] DEEP CONTEXT: END-TO-END CONTEXTUAL SPEECH RECOGNITION
    Pundak, Golan
    Sainath, Tara N.
    Prabhavalkar, Rohit
    Kannan, Anjuli
    Zhao, Ding
    [J]. 2018 IEEE WORKSHOP ON SPOKEN LANGUAGE TECHNOLOGY (SLT 2018), 2018, : 418 - 425
  • [3] TOWARDS END-TO-END UNSUPERVISED SPEECH RECOGNITION
    Liu, Alexander H.
    Hsu, Wei-Ning
    Auli, Michael
    Baevski, Alexei
    [J]. 2022 IEEE SPOKEN LANGUAGE TECHNOLOGY WORKSHOP, SLT, 2022, : 221 - 228
  • [4] End-to-End Spelling Correction Conditioned on Acoustic Feature for Code-switching Speech Recognition
    Zhang, Shuai
    Yi, Jiangyan
    Tian, Zhengkun
    Bai, Ye
    Tao, Jianhua
    Liu, Xuefei
    Wen, Zhengqi
    [J]. INTERSPEECH 2021, 2021, : 266 - 270
  • [5] Contextual Speech Recognition in End-to-End Neural Network Systems using Beam Search
    Williams, Ian
    Kannan, Anjuli
    Aleksci, Petar
    Rybach, David
    Sainath, Tara N.
    [J]. 19TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2018), VOLS 1-6: SPEECH RESEARCH FOR EMERGING MARKETS IN MULTILINGUAL SOCIETIES, 2018, : 2227 - 2231
  • [6] Towards end-to-end speech recognition with transfer learning
    Qin, Chu-Xiong
    Qu, Dan
    Zhang, Lian-Hai
    [J]. EURASIP JOURNAL ON AUDIO SPEECH AND MUSIC PROCESSING, 2018,
  • [7] Towards end-to-end speech recognition with transfer learning
    Chu-Xiong Qin
    Dan Qu
    Lian-Hai Zhang
    [J]. EURASIP Journal on Audio, Speech, and Music Processing, 2018
  • [8] Investigation of Transformer based Spelling Correction Model for CTC-based End-to-End Mandarin Speech Recognition
    Zhang, Shiliang
    Lei, Ming
    Yan, Zhijie
    [J]. INTERSPEECH 2019, 2019, : 2180 - 2184
  • [9] PERSONALIZATION STRATEGIES FOR END-TO-END SPEECH RECOGNITION SYSTEMS
    Gourav, Aditya
    Liu, Linda
    Gandhe, Ankur
    Gu, Yile
    Lan, Guitang
    Huang, Xiangyang
    Kalmane, Shashank
    Tiwari, Gautam
    Filimonov, Denis
    Rastrow, Ariya
    Stolcke, Andreas
    Bulyko, Ivan
    Alexa, Amazon
    [J]. 2021 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP 2021), 2021, : 7348 - 7352
  • [10] TOWARDS LANGUAGE-UNIVERSAL END-TO-END SPEECH RECOGNITION
    Kim, Suyoun
    Seltzer, Michael L.
    [J]. 2018 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2018, : 4914 - 4918