RESCOREBERT: DISCRIMINATIVE SPEECH RECOGNITION RESCORING WITH BERT

被引:7
|
作者
Xu, Liyan [1 ,2 ]
Gu, Yile [1 ]
Kolehmainen, Jari [1 ]
Khan, Haidar [1 ]
Gandhe, Ankur [1 ]
Rastrow, Ariya [1 ]
Stoleke, Andreas [1 ]
Bulyko, Ivan [1 ]
机构
[1] Amazon Alexa AI, Seattle, WA 98121 USA
[2] Emory Univ, Atlanta, GA 30322 USA
关键词
masked language model; BERT; second-pass rescoring; pretrained model; minimum WER training;
D O I
10.1109/ICASSP43922.2022.9747118
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
Second-pass rescoring is an important component in automatic speech recognition (ASR) systems that is used to improve the outputs from a first-pass decoder by implementing a lattice rescoring or n-best re-ranking. While pretraining with a masked language model (MLM) objective has received great success in various natural language understanding (NLU) tasks, it has not gained traction as a rescoring model for ASR. Specifically, training a bidirectional model like BERT on a discriminative objective such as minimum WER (MWER) has not been explored. Here we show how to train a BERT-based rescoring model with MWER loss, to incorporate the improvements of a discriminative loss into fine-tuning of deep bidirectional pretrained models for ASR. Specifically, we propose a fusion strategy that incorporates the MLM into the discriminative training process to effectively distill knowledge from a pretrained model. We further propose an alternative discriminative loss. This approach, which we call RescoreBERT, reduces WER by 6.6%/3.4% relative on the LibriSpeech clean/other test sets over a BERT baseline without discriminative objective. We also evaluate our method on an internal dataset from a conversational agent and find that it reduces both latency and WER (by 3 to 8% relative) over an LSTM rescoring model.
引用
收藏
页码:6117 / 6121
页数:5
相关论文
共 50 条
  • [41] On-the-fly Lattice Rescoring for Real-time Automatic Speech Recognition
    Sak, Hasim
    Saraclar, Murat
    Gungor, Tunga
    [J]. 11TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2010 (INTERSPEECH 2010), VOLS 3 AND 4, 2010, : 2450 - +
  • [42] Chain-based Discriminative Autoencoders for Speech Recognition
    Lee, Hung-Shin
    Huang, Pin-Tuan
    Cheng, Yao-Fei
    Wang, Hsin-Min
    [J]. INTERSPEECH 2022, 2022, : 2078 - 2082
  • [43] Automatic speech recognition systems: A survey of discriminative techniques
    Kaur, Amrit Preet
    Singh, Amitoj
    Sachdeva, Rohit
    Kukreja, Vinay
    [J]. MULTIMEDIA TOOLS AND APPLICATIONS, 2023, 82 (09) : 13307 - 13339
  • [44] DISCRIMINATIVE LANGUAGE MODELING FOR SPEECH RECOGNITION WITH RELEVANCE INFORMATION
    Chen, Berlin
    Liu, Jia-Wen
    [J]. 2011 IEEE INTERNATIONAL CONFERENCE ON MULTIMEDIA AND EXPO (ICME), 2011,
  • [45] Discriminative training of HMMs for automatic speech recognition: A survey
    Jiang, Hui
    [J]. COMPUTER SPEECH AND LANGUAGE, 2010, 24 (04): : 589 - 608
  • [46] Discriminative temporal feature extraction for robust speech recognition
    Shen, JL
    [J]. ELECTRONICS LETTERS, 1997, 33 (19) : 1598 - 1600
  • [47] Survey on discriminative feature selection for speech emotion recognition
    Xu, Xin
    Li, Ya
    Xu, Xiaoying
    Wen, Zhengqi
    Che, Hao
    Liu, Shanfeng
    Tao, Jianhua
    [J]. 2014 9TH INTERNATIONAL SYMPOSIUM ON CHINESE SPOKEN LANGUAGE PROCESSING (ISCSLP), 2014, : 345 - +
  • [48] Emotion Recognition in Speech with Latent Discriminative Representations Learning
    Han, Jing
    Zhang, Zixing
    Keren, Gil
    Schuller, Bjorn
    [J]. ACTA ACUSTICA UNITED WITH ACUSTICA, 2018, 104 (05) : 737 - 740
  • [49] A Decade of Discriminative Language Modeling for Automatic Speech Recognition
    Saraclar, Murat
    Dikici, Erinc
    Arisoy, Ebru
    [J]. SPEECH AND COMPUTER (SPECOM 2015), 2015, 9319 : 11 - 22
  • [50] Discriminative of Wavelet Sub-Signals for Speech Recognition
    Hsiao, Chao-Yin
    Teng, Chin Kun
    Yang, Paohwa
    Huang, Hao Ming
    [J]. 11TH IEEE INTERNATIONAL CONFERENCE ON CONTROL AND AUTOMATION (ICCA), 2014, : 1404 - 1409