Residual Language Model for End-to-end Speech Recognition

被引:3
|
作者
Tsunoo, Emiru [1 ]
Kashiwagi, Yosuke [1 ]
Narisetty, Chaitanya [2 ]
Watanabe, Shinji [2 ]
机构
[1] Sony Grp Corp, Tokyo, Japan
[2] Carnegie Mellon Univ, Pittsburgh, PA 15213 USA
来源
关键词
speech recognition; language model; attention-based encoder-decoder; internal language model estimation;
D O I
10.21437/Interspeech.2022-10557
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
End-to-end automatic speech recognition suffers from adaptation to unknown target domain speech despite being trained with a large amount of paired audio-text data. Recent studies estimate a linguistic bias of the model as the internal language model (LM). To effectively adapt to the target domain, the internal LM is subtracted from the posterior during inference and fused with an external target-domain LM. However, this fusion complicates the inference and the estimation of the internal LM may not always be accurate. In this paper, we propose a simple external LM fusion method for domain adaptation, which considers the internal LM estimation in its training. We directly model the residual factor of the external and internal LMs, namely the residual LM. To stably train the residual LM, we propose smoothing the estimated internal LM and optimizing it with a combination of cross-entropy and mean-squared-error losses, which consider the statistical behaviors of the internal LM in the target domain data. We experimentally confirmed that the proposed residual LM performs better than the internal LM estimation in most of the cross-domain and intra-domain scenarios.
引用
收藏
页码:3899 / 3903
页数:5
相关论文
共 50 条
  • [41] End-to-End Speech Recognition For Arabic Dialects
    Nasr, Seham
    Duwairi, Rehab
    Quwaider, Muhannad
    [J]. ARABIAN JOURNAL FOR SCIENCE AND ENGINEERING, 2023, 48 (08) : 10617 - 10633
  • [42] TOWARDS END-TO-END UNSUPERVISED SPEECH RECOGNITION
    Liu, Alexander H.
    Hsu, Wei-Ning
    Auli, Michael
    Baevski, Alexei
    [J]. 2022 IEEE SPOKEN LANGUAGE TECHNOLOGY WORKSHOP, SLT, 2022, : 221 - 228
  • [43] TRIGGERED ATTENTION FOR END-TO-END SPEECH RECOGNITION
    Moritz, Niko
    Hori, Takaaki
    Le Roux, Jonathan
    [J]. 2019 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2019, : 5666 - 5670
  • [44] An Overview of End-to-End Automatic Speech Recognition
    Wang, Dong
    Wang, Xiaodong
    Lv, Shaohe
    [J]. SYMMETRY-BASEL, 2019, 11 (08):
  • [45] END-TO-END VISUAL SPEECH RECOGNITION WITH LSTMS
    Petridis, Stavros
    Li, Zuwei
    Pantic, Maja
    [J]. 2017 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2017, : 2592 - 2596
  • [46] SYNCHRONOUS TRANSFORMERS FOR END-TO-END SPEECH RECOGNITION
    Tian, Zhengkun
    Yi, Jiangyan
    Bai, Ye
    Tao, Jianhua
    Zhang, Shuai
    Wen, Zhengqi
    [J]. 2020 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2020, : 7884 - 7888
  • [47] PARAMETER UNCERTAINTY FOR END-TO-END SPEECH RECOGNITION
    Braun, Stefan
    Liu, Shih-Chii
    [J]. 2019 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2019, : 5636 - 5640
  • [48] End-to-End Speech Recognition For Arabic Dialects
    Seham Nasr
    Rehab Duwairi
    Muhannad Quwaider
    [J]. Arabian Journal for Science and Engineering, 2023, 48 : 10617 - 10633
  • [49] AN END-TO-END MULTITASK LEARNING MODEL TO IMPROVE SPEECH EMOTION RECOGNITION
    Fu, Changzeng
    Liu, Chaoran
    Ishi, Carlos Toshinori
    Ishiguro, Hiroshi
    [J]. 28TH EUROPEAN SIGNAL PROCESSING CONFERENCE (EUSIPCO 2020), 2021, : 351 - 355
  • [50] End-to-end Korean Digits Speech Recognition
    Roh, Jong-hyuk
    Cho, Kwantae
    Kim, Youngsam
    Cho, Sangrae
    [J]. 2019 10TH INTERNATIONAL CONFERENCE ON INFORMATION AND COMMUNICATION TECHNOLOGY CONVERGENCE (ICTC): ICT CONVERGENCE LEADING THE AUTONOMOUS FUTURE, 2019, : 1137 - 1139