RESCOREBERT: DISCRIMINATIVE SPEECH RECOGNITION RESCORING WITH BERT

被引：7

作者：

Xu, Liyan ^{[1
,2
]}

Gu, Yile ^{[1
]}

Kolehmainen, Jari ^{[1
]}

Khan, Haidar ^{[1
]}

Gandhe, Ankur ^{[1
]}

Rastrow, Ariya ^{[1
]}

Stoleke, Andreas ^{[1
]}

Bulyko, Ivan ^{[1
]}

机构：

[1] Amazon Alexa AI, Seattle, WA 98121 USA

[2] Emory Univ, Atlanta, GA 30322 USA

来源：

2022 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP) | 2022年

关键词：

masked language model; BERT; second-pass rescoring; pretrained model; minimum WER training;

D O I：

10.1109/ICASSP43922.2022.9747118

中图分类号：

O42 [声学];

学科分类号：

070206 ; 082403 ;

摘要：

Second-pass rescoring is an important component in automatic speech recognition (ASR) systems that is used to improve the outputs from a first-pass decoder by implementing a lattice rescoring or n-best re-ranking. While pretraining with a masked language model (MLM) objective has received great success in various natural language understanding (NLU) tasks, it has not gained traction as a rescoring model for ASR. Specifically, training a bidirectional model like BERT on a discriminative objective such as minimum WER (MWER) has not been explored. Here we show how to train a BERT-based rescoring model with MWER loss, to incorporate the improvements of a discriminative loss into fine-tuning of deep bidirectional pretrained models for ASR. Specifically, we propose a fusion strategy that incorporates the MLM into the discriminative training process to effectively distill knowledge from a pretrained model. We further propose an alternative discriminative loss. This approach, which we call RescoreBERT, reduces WER by 6.6%/3.4% relative on the LibriSpeech clean/other test sets over a BERT baseline without discriminative objective. We also evaluate our method on an internal dataset from a conversational agent and find that it reduces both latency and WER (by 3 to 8% relative) over an LSTM rescoring model.

引用

页码：6117 / 6121

页数：5

共 50 条

[1] Personalization for BERT-based Discriminative Speech Recognition Rescoring
Kolehmainen, Jari
Gu, Yile
Gourav, Aditya
Shivakumar, Prashanth Gurunath
Gandhe, Ankur
Rastrow, Ariya
Bulyko, Ivan
[J]. INTERSPEECH 2023, 2023, : 366 - 370
[2] Scaling Laws for Discriminative Speech Recognition Rescoring Models
Gu, Yile
Shivakumar, Prashanth Gurunath
Kolehmainen, Jari
Gandhe, Ankur
Rastrow, Ariya
Bulyko, Ivan
[J]. INTERSPEECH 2023, 2023, : 471 - 475
[3] BERT-based Semantic Model for Rescoring N-best Speech Recognition List
Fohr, Dominique
Illina, Irina
[J]. INTERSPEECH 2021, 2021, : 1867 - 1871
[4] Discriminative incorporation of explicitly trained tone models into lattice based rescoring for Mandarin speech recognition
Huang, Hao
Zhu, Jie
[J]. 2008 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING, VOLS 1-12, 2008, : 1541 - 1544
[5] ATTRIBUTE BASED LATTICE RESCORING IN SPONTANEOUS SPEECH RECOGNITION
Chen, I-Fan
Siniscalchi, Sabato Marco
Lee, Chin-Hui
[J]. 2014 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2014,
[6] Parallel Rescoring with Transformer for Streaming On-Device Speech Recognition
Li, Wei
Qin, James
Chiu, Chung-Cheng
Pang, Ruoming
He, Yanzhang
[J]. INTERSPEECH 2020, 2020, : 2122 - 2126
[7] A Study on Lattice Rescoring with Knowledge Scores for Automatic Speech Recognition
Siniscalchi, Sabato Marco
Li, Jinyu
Lee, Chin-Hui
[J]. INTERSPEECH 2006 AND 9TH INTERNATIONAL CONFERENCE ON SPOKEN LANGUAGE PROCESSING, VOLS 1-5, 2006, : 517 - 520
[8] Qualitative Evaluation of Language Model Rescoring in Automatic Speech Recognition
Roux, Thibault Baneras
Rouvier, Mickael
Wottawa, Jane
Dufour, Richard
[J]. INTERSPEECH 2022, 2022, : 3968 - 3972
[9] Context-aware RNNLM Rescoring for Conversational Speech Recognition
Wei, Kun
Guo, Pengcheng
Lv, Hang
Tu, Zhen
Xie, Lei
[J]. 2021 12TH INTERNATIONAL SYMPOSIUM ON CHINESE SPOKEN LANGUAGE PROCESSING (ISCSLP), 2021,
[10] BAYESIAN DISCRIMINATIVE ADAPTATION FOR SPEECH RECOGNITION
Raut, C. K.
Gales, M. J. F.
[J]. 2009 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOLS 1- 8, PROCEEDINGS, 2009, : 4361 - 4364

← 1 2 3 4 5 →