CONFIDENCE ESTIMATION FOR ATTENTION-BASED SEQUENCE-TO-SEQUENCE MODELS FOR SPEECH RECOGNITION

被引:19
|
作者
Li, Qiujia [1 ,3 ]
Qiu, David [2 ]
Zhang, Yu [2 ]
Li, Bo [2 ]
He, Yanzhang [2 ]
Woodland, Philip C. [1 ]
Cao, Liangliang [2 ]
Strohman, Trevor [2 ]
机构
[1] Univ Cambridge, Cambridge, England
[2] Google LLC, Mountain View, CA 94043 USA
[3] Google, Mountain View, CA 94043 USA
关键词
confidence scores; end-to-end ASR;
D O I
10.1109/ICASSP39728.2021.9414920
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
For various speech-related tasks, confidence scores from a speech recogniser are a useful measure to assess the quality of transcriptions. In traditional hidden Markov model-based automatic speech recognition (ASR) systems, confidence scores can be reliably obtained from word posteriors in decoding lattices. However, for an ASR system with an auto-regressive decoder, such as an attention-based sequence-to-sequence model, computing word posteriors is difficult. An obvious alternative is to use the decoder softmax probability as the model confidence. In this paper, we first examine how some commonly used regularisation methods influence the softmax-based confidence scores and study the overconfident behaviour of end-to-end models. Then we propose a lightweight and effective approach named confidence estimation module (CEM) on top of an existing end-to-end ASR model. Experiments on LibriSpeech show that CEM can mitigate the overconfidence problem and can produce more reliable confidence scores with and without shallow fusion of a language model. Further analysis shows that CEM generalises well to speech from a moderately mismatched domain and can potentially improve downstream tasks such as semi-supervised learning.
引用
收藏
页码:6388 / 6392
页数:5
相关论文
共 50 条
  • [1] INTEGRATING SOURCE-CHANNEL AND ATTENTION-BASED SEQUENCE-TO-SEQUENCE MODELS FOR SPEECH RECOGNITION
    Li, Qiujia
    Zhang, Chao
    Woodland, Philip C.
    [J]. 2019 IEEE AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING WORKSHOP (ASRU 2019), 2019, : 39 - 46
  • [2] SUPERVISED ATTENTION IN SEQUENCE-TO-SEQUENCE MODELS FOR SPEECH RECOGNITION
    Yang, Gene-Ping
    Tang, Hao
    [J]. 2022 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2022, : 7222 - 7226
  • [3] SEQUENCE-LEVEL KNOWLEDGE DISTILLATION FOR MODEL COMPRESSION OF ATTENTION-BASED SEQUENCE-TO-SEQUENCE SPEECH RECOGNITION
    Mun'im, Raden Mu'az
    Inoue, Nakamasa
    Shinoda, Koichi
    [J]. 2019 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2019, : 6151 - 6155
  • [4] A Comparison of Sequence-to-Sequence Models for Speech Recognition
    Prabhavalkar, Rohit
    Rao, Kanishka
    Sainath, Tara N.
    Li, Bo
    Johnson, Leif
    Jaitly, Navdeep
    [J]. 18TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2017), VOLS 1-6: SITUATED INTERACTION, 2017, : 939 - 943
  • [5] MINIMUM WORD ERROR RATE TRAINING FOR ATTENTION-BASED SEQUENCE-TO-SEQUENCE MODELS
    Prabhavalkar, Rohit
    Sainath, Tara N.
    Wu, Yonghui
    Nguyen, Patrick
    Chen, Zhifeng
    Chiu, Chung-Cheng
    Kannan, Anjuli
    [J]. 2018 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2018, : 4839 - 4843
  • [6] Advancing sequence-to-sequence based speech recognition
    Tuske, Zoltan
    Audhkhasi, Kartik
    Saon, George
    [J]. INTERSPEECH 2019, 2019, : 3780 - 3784
  • [7] Improving Attention Based Sequence-to-Sequence Models for End-to-End English Conversational Speech Recognition
    Weng, Chao
    Cui, Jia
    Wang, Guangsen
    Wang, Jun
    Yu, Changzhu
    Su, Dan
    Yu, Dong
    [J]. 19TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2018), VOLS 1-6: SPEECH RESEARCH FOR EMERGING MARKETS IN MULTILINGUAL SOCIETIES, 2018, : 761 - 765
  • [8] DIALOG STATE TRACKING WITH ATTENTION-BASED SEQUENCE-TO-SEQUENCE LEARNING
    Hori, Takaaki
    Wang, Hai
    Hori, Chiori
    Watanabe, Shinji
    Harsham, Bret
    Le Roux, Jonathan
    Hershey, John R.
    Koji, Yusuke
    Jing, Yi
    Zhu, Zhaocheng
    Aikawa, Takeyuki
    [J]. 2016 IEEE WORKSHOP ON SPOKEN LANGUAGE TECHNOLOGY (SLT 2016), 2016, : 552 - 558
  • [9] Attention-Based Sequence-to-Sequence Model for Time Series Imputation
    Li, Yurui
    Du, Mingjing
    He, Sheng
    [J]. ENTROPY, 2022, 24 (12)
  • [10] STATE-OF-THE-ART SPEECH RECOGNITION WITH SEQUENCE-TO-SEQUENCE MODELS
    Chiu, Chung-Cheng
    Sainath, Tara N.
    Wu, Yonghui
    Prabhavalkar, Rohit
    Nguyen, Patrick
    Chen, Zhifeng
    Kannan, Anjuli
    Weiss, Ron J.
    Rao, Kanishka
    Gonina, Ekaterina
    Jaitly, Navdeep
    Li, Bo
    Chorowski, Jan
    Bacchiani, Michiel
    [J]. 2018 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2018, : 4774 - 4778