RESCOREBERT: DISCRIMINATIVE SPEECH RECOGNITION RESCORING WITH BERT

被引:7
|
作者
Xu, Liyan [1 ,2 ]
Gu, Yile [1 ]
Kolehmainen, Jari [1 ]
Khan, Haidar [1 ]
Gandhe, Ankur [1 ]
Rastrow, Ariya [1 ]
Stoleke, Andreas [1 ]
Bulyko, Ivan [1 ]
机构
[1] Amazon Alexa AI, Seattle, WA 98121 USA
[2] Emory Univ, Atlanta, GA 30322 USA
关键词
masked language model; BERT; second-pass rescoring; pretrained model; minimum WER training;
D O I
10.1109/ICASSP43922.2022.9747118
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
Second-pass rescoring is an important component in automatic speech recognition (ASR) systems that is used to improve the outputs from a first-pass decoder by implementing a lattice rescoring or n-best re-ranking. While pretraining with a masked language model (MLM) objective has received great success in various natural language understanding (NLU) tasks, it has not gained traction as a rescoring model for ASR. Specifically, training a bidirectional model like BERT on a discriminative objective such as minimum WER (MWER) has not been explored. Here we show how to train a BERT-based rescoring model with MWER loss, to incorporate the improvements of a discriminative loss into fine-tuning of deep bidirectional pretrained models for ASR. Specifically, we propose a fusion strategy that incorporates the MLM into the discriminative training process to effectively distill knowledge from a pretrained model. We further propose an alternative discriminative loss. This approach, which we call RescoreBERT, reduces WER by 6.6%/3.4% relative on the LibriSpeech clean/other test sets over a BERT baseline without discriminative objective. We also evaluate our method on an internal dataset from a conversational agent and find that it reduces both latency and WER (by 3 to 8% relative) over an LSTM rescoring model.
引用
下载
收藏
页码:6117 / 6121
页数:5
相关论文
共 50 条
  • [21] Discriminative Named Entity Recognition of Speech Data using Speech Recognition Confidence
    Sudoh, Katsuhito
    Tsukada, Hajime
    Isozaki, Hideki
    INTERSPEECH 2006 AND 9TH INTERNATIONAL CONFERENCE ON SPOKEN LANGUAGE PROCESSING, VOLS 1-5, 2006, : 337 - 340
  • [22] Incorporating speech recognition confidence into discriminative named entity recognition of speech data
    Sudoh, Katsuhito
    Tsukada, Hajime
    Isozaki, Hideki
    COLING/ACL 2006, VOLS 1 AND 2, PROCEEDINGS OF THE CONFERENCE, 2006, : 617 - 624
  • [23] Improved mandarin speech recognition by lattice rescoring with enhanced tone models
    Wang, Huanliang
    Qian, Yao
    Soong, Frank
    Zhou, Jian-Lai
    Han, Jiqing
    CHINESE SPOKEN LANGUAGE PROCESSING, PROCEEDINGS, 2006, 4274 : 445 - +
  • [24] A study on knowledge source integration for candidate rescoring in automatic speech recognition
    Li, J
    Tsao, Y
    Lee, CH
    2005 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOLS 1-5: SPEECH PROCESSING, 2005, : 837 - 840
  • [25] DISCRIMINATIVE OUTPUT CODING FEATURES FOR SPEECH RECOGNITION
    Dehzangi, Omid
    Ma, Bin
    Chng, Eng Siong
    Li, Haizhou
    2008 6TH INTERNATIONAL SYMPOSIUM ON CHINESE SPOKEN LANGUAGE PROCESSING, PROCEEDINGS, 2008, : 89 - 92
  • [26] Jointly Optimized Discriminative Features for Speech Recognition
    Ng, Tim
    Zhang, Bing
    Long Nguyen
    11TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2010 (INTERSPEECH 2010), VOLS 3 AND 4, 2010, : 2626 - 2629
  • [27] Improved Lattice Rescoring by Using Speech Attributes in Large Vocabulary Continuous Speech Recognition Systems
    Gao, Xinglong
    Zhang, Qingqing
    Pan, Jielin
    2013 6TH INTERNATIONAL CONGRESS ON IMAGE AND SIGNAL PROCESSING (CISP), VOLS 1-3, 2013, : 143 - 147
  • [28] Discriminative pronunciation modeling for dialectal speech recognition
    Lehr, Maider
    Gorman, Kyle
    Shafran, Izhak
    15TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2014), VOLS 1-4, 2014, : 1458 - 1462
  • [29] Speech Emotion Recognition with Discriminative Feature Learning
    Zhou, Huan
    Liu, Kai
    INTERSPEECH 2020, 2020, : 4094 - 4097
  • [30] Using SVMs and discriminative models for speech recognition
    Smith, ND
    Gales, MJF
    2002 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOLS I-IV, PROCEEDINGS, 2002, : 77 - 80