Towards Contextual Spelling Correction for Customization of End-to-End Speech Recognition Systems

被引:3
|
作者
Wang, Xiaoqiang [1 ]
Liu, Yanqing [1 ]
Li, Jinyu [2 ]
Miljanic, Veljko [2 ]
Zhao, Sheng [1 ]
Khalil, Hosam [2 ]
机构
[1] Microsoft, Beijing 100080, Peoples R China
[2] Microsoft, Redmond, WA 98052 USA
关键词
Context modeling; Decoding; Training; Iron; Indexes; Transformers; Task analysis; Speech recognition; contextual spelling correc- tion; contextual biasing; non-autoregressive;
D O I
10.1109/TASLP.2022.3205753
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
Contextual biasing is an important and challenging task for end-to-end automatic speech recognition (ASR) systems, which aims to achieve better recognition performance by biasing the ASR system to particular context phrases such as person names, music list, proper nouns, etc. Existing methods mainly include contextual LM biasing and adding bias encoder into end-to-end ASR models. In this work, we introduce a novel approach to do contextual biasing by adding a contextual spelling correction model on top of the end-to-end ASR system. We incorporate contextual information into a sequence-to-sequence spelling correction model with a shared context encoder. The proposed model includes two different mechanisms: autoregressive (AR) and non-autoregressive (NAR). We also propose filtering algorithms to handle large-size context lists, and performance balancing mechanisms to control the biasing degree of the model. The proposed model is a general biasing solution which is domain-insensitive and can be adopted in different scenarios. Experiments show that the proposed method achieves as much as 51% relative word error rate (WER) reduction over ASR system and outperforms traditional biasing methods. Compared to the AR solution, the NAR model reduces model size by 43.2% and speeds up inference by 2.1 times.
引用
收藏
页码:3089 / 3097
页数:9
相关论文
共 50 条
  • [31] Towards end-to-end training of automatic speech recognition for nigerian pidgin
    Ajisafe, Daniel
    Adegboro, Oluwabukola
    Oduntan, Esther
    Arulogun, Tayo
    [J]. arXiv, 2020,
  • [32] Exploring end-to-end framework towards Khasi speech recognition system
    Syiem, Bronson
    Singh, L. Joyprakash
    [J]. INTERNATIONAL JOURNAL OF SPEECH TECHNOLOGY, 2021, 24 (02) : 419 - 424
  • [33] UNIFIED END-TO-END SPEECH RECOGNITION AND ENDPOINTING FOR FAST AND EFFICIENT SPEECH SYSTEMS
    Bijwadia, Shaan
    Chang, Shuo-yiin
    Li, Bo
    Sainath, Tara
    Zhang, Chao
    He, Yanzhang
    [J]. 2022 IEEE SPOKEN LANGUAGE TECHNOLOGY WORKSHOP, SLT, 2022, : 310 - 316
  • [34] IMPROVING UNSUPERVISED STYLE TRANSFER IN END-TO-END SPEECH SYNTHESIS WITH END-TO-END SPEECH RECOGNITION
    Liu, Da-Rong
    Yang, Chi-Yu
    Wu, Szu-Lin
    Lee, Hung-Yi
    [J]. 2018 IEEE WORKSHOP ON SPOKEN LANGUAGE TECHNOLOGY (SLT 2018), 2018, : 640 - 647
  • [35] END-TO-END TRAINING OF A LARGE VOCABULARY END-TO-END SPEECH RECOGNITION SYSTEM
    Kim, Chanwoo
    Kim, Sungsoo
    Kim, Kwangyoun
    Kumar, Mehul
    Kim, Jiyeon
    Lee, Kyungmin
    Han, Changwoo
    Garg, Abhinav
    Kim, Eunhyang
    Shin, Minkyoo
    Singh, Shatrughan
    Heck, Larry
    Gowda, Dhananjaya
    [J]. 2019 IEEE AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING WORKSHOP (ASRU 2019), 2019, : 562 - 569
  • [36] Analyzing Hidden Representations in End-to-End Automatic Speech Recognition Systems
    Belinkov, Yonatan
    Glass, James
    [J]. ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 30 (NIPS 2017), 2017, 30
  • [37] Controlling the Noise Robustness of End-to-End Automatic Speech Recognition Systems
    Moeller, Matthias
    Twiefel, Johannes
    Weber, Cornelius
    Wermter, Stefan
    [J]. 2021 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2021,
  • [38] Towards End-to-End Speech Recognition with Deep Multipath Convolutional Neural Networks
    Zhang, Wei
    Zhai, Minghao
    Huang, Zilong
    Liu, Chen
    Li, Wei
    Cao, Yi
    [J]. INTELLIGENT ROBOTICS AND APPLICATIONS, ICIRA 2019, PART VI, 2019, 11745 : 332 - 341
  • [39] An End-to-End model for Vietnamese speech recognition
    Van Huy Nguyen
    [J]. 2019 IEEE - RIVF INTERNATIONAL CONFERENCE ON COMPUTING AND COMMUNICATION TECHNOLOGIES (RIVF), 2019, : 307 - 312
  • [40] END-TO-END VISUAL SPEECH RECOGNITION WITH LSTMS
    Petridis, Stavros
    Li, Zuwei
    Pantic, Maja
    [J]. 2017 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2017, : 2592 - 2596