RESCOREBERT: DISCRIMINATIVE SPEECH RECOGNITION RESCORING WITH BERT

被引:7
|
作者
Xu, Liyan [1 ,2 ]
Gu, Yile [1 ]
Kolehmainen, Jari [1 ]
Khan, Haidar [1 ]
Gandhe, Ankur [1 ]
Rastrow, Ariya [1 ]
Stoleke, Andreas [1 ]
Bulyko, Ivan [1 ]
机构
[1] Amazon Alexa AI, Seattle, WA 98121 USA
[2] Emory Univ, Atlanta, GA 30322 USA
关键词
masked language model; BERT; second-pass rescoring; pretrained model; minimum WER training;
D O I
10.1109/ICASSP43922.2022.9747118
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
Second-pass rescoring is an important component in automatic speech recognition (ASR) systems that is used to improve the outputs from a first-pass decoder by implementing a lattice rescoring or n-best re-ranking. While pretraining with a masked language model (MLM) objective has received great success in various natural language understanding (NLU) tasks, it has not gained traction as a rescoring model for ASR. Specifically, training a bidirectional model like BERT on a discriminative objective such as minimum WER (MWER) has not been explored. Here we show how to train a BERT-based rescoring model with MWER loss, to incorporate the improvements of a discriminative loss into fine-tuning of deep bidirectional pretrained models for ASR. Specifically, we propose a fusion strategy that incorporates the MLM into the discriminative training process to effectively distill knowledge from a pretrained model. We further propose an alternative discriminative loss. This approach, which we call RescoreBERT, reduces WER by 6.6%/3.4% relative on the LibriSpeech clean/other test sets over a BERT baseline without discriminative objective. We also evaluate our method on an internal dataset from a conversational agent and find that it reduces both latency and WER (by 3 to 8% relative) over an LSTM rescoring model.
引用
收藏
页码:6117 / 6121
页数:5
相关论文
共 50 条
  • [1] Personalization for BERT-based Discriminative Speech Recognition Rescoring
    Kolehmainen, Jari
    Gu, Yile
    Gourav, Aditya
    Shivakumar, Prashanth Gurunath
    Gandhe, Ankur
    Rastrow, Ariya
    Bulyko, Ivan
    [J]. INTERSPEECH 2023, 2023, : 366 - 370
  • [2] Scaling Laws for Discriminative Speech Recognition Rescoring Models
    Gu, Yile
    Shivakumar, Prashanth Gurunath
    Kolehmainen, Jari
    Gandhe, Ankur
    Rastrow, Ariya
    Bulyko, Ivan
    [J]. INTERSPEECH 2023, 2023, : 471 - 475
  • [3] BERT-based Semantic Model for Rescoring N-best Speech Recognition List
    Fohr, Dominique
    Illina, Irina
    [J]. INTERSPEECH 2021, 2021, : 1867 - 1871
  • [4] Discriminative incorporation of explicitly trained tone models into lattice based rescoring for Mandarin speech recognition
    Huang, Hao
    Zhu, Jie
    [J]. 2008 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING, VOLS 1-12, 2008, : 1541 - 1544
  • [5] ATTRIBUTE BASED LATTICE RESCORING IN SPONTANEOUS SPEECH RECOGNITION
    Chen, I-Fan
    Siniscalchi, Sabato Marco
    Lee, Chin-Hui
    [J]. 2014 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2014,
  • [6] Parallel Rescoring with Transformer for Streaming On-Device Speech Recognition
    Li, Wei
    Qin, James
    Chiu, Chung-Cheng
    Pang, Ruoming
    He, Yanzhang
    [J]. INTERSPEECH 2020, 2020, : 2122 - 2126
  • [7] A Study on Lattice Rescoring with Knowledge Scores for Automatic Speech Recognition
    Siniscalchi, Sabato Marco
    Li, Jinyu
    Lee, Chin-Hui
    [J]. INTERSPEECH 2006 AND 9TH INTERNATIONAL CONFERENCE ON SPOKEN LANGUAGE PROCESSING, VOLS 1-5, 2006, : 517 - 520
  • [8] Qualitative Evaluation of Language Model Rescoring in Automatic Speech Recognition
    Roux, Thibault Baneras
    Rouvier, Mickael
    Wottawa, Jane
    Dufour, Richard
    [J]. INTERSPEECH 2022, 2022, : 3968 - 3972
  • [9] Context-aware RNNLM Rescoring for Conversational Speech Recognition
    Wei, Kun
    Guo, Pengcheng
    Lv, Hang
    Tu, Zhen
    Xie, Lei
    [J]. 2021 12TH INTERNATIONAL SYMPOSIUM ON CHINESE SPOKEN LANGUAGE PROCESSING (ISCSLP), 2021,
  • [10] BAYESIAN DISCRIMINATIVE ADAPTATION FOR SPEECH RECOGNITION
    Raut, C. K.
    Gales, M. J. F.
    [J]. 2009 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOLS 1- 8, PROCEEDINGS, 2009, : 4361 - 4364