Residual Language Model for End-to-end Speech Recognition

被引:3
|
作者
Tsunoo, Emiru [1 ]
Kashiwagi, Yosuke [1 ]
Narisetty, Chaitanya [2 ]
Watanabe, Shinji [2 ]
机构
[1] Sony Grp Corp, Tokyo, Japan
[2] Carnegie Mellon Univ, Pittsburgh, PA 15213 USA
来源
关键词
speech recognition; language model; attention-based encoder-decoder; internal language model estimation;
D O I
10.21437/Interspeech.2022-10557
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
End-to-end automatic speech recognition suffers from adaptation to unknown target domain speech despite being trained with a large amount of paired audio-text data. Recent studies estimate a linguistic bias of the model as the internal language model (LM). To effectively adapt to the target domain, the internal LM is subtracted from the posterior during inference and fused with an external target-domain LM. However, this fusion complicates the inference and the estimation of the internal LM may not always be accurate. In this paper, we propose a simple external LM fusion method for domain adaptation, which considers the internal LM estimation in its training. We directly model the residual factor of the external and internal LMs, namely the residual LM. To stably train the residual LM, we propose smoothing the estimated internal LM and optimizing it with a combination of cross-entropy and mean-squared-error losses, which consider the statistical behaviors of the internal LM in the target domain data. We experimentally confirmed that the proposed residual LM performs better than the internal LM estimation in most of the cross-domain and intra-domain scenarios.
引用
收藏
页码:3899 / 3903
页数:5
相关论文
共 50 条
  • [1] End-to-End Speech Recognition of Tamil Language
    Changrampadi, Mohamed Hashim
    Shahina, A.
    Narayanan, M. Badri
    Khan, A. Nayeemulla
    [J]. INTELLIGENT AUTOMATION AND SOFT COMPUTING, 2022, 32 (02): : 1309 - 1323
  • [2] An End-to-End model for Vietnamese speech recognition
    Van Huy Nguyen
    [J]. 2019 IEEE - RIVF INTERNATIONAL CONFERENCE ON COMPUTING AND COMMUNICATION TECHNOLOGIES (RIVF), 2019, : 307 - 312
  • [3] ADVERSARIAL TRAINING OF END-TO-END SPEECH RECOGNITION USING A CRITICIZING LANGUAGE MODEL
    Liu, Alexander H.
    Lee, Hung-yi
    Lee, Lin-shan
    [J]. 2019 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2019, : 6176 - 6180
  • [4] NEURAL-FST CLASS LANGUAGE MODEL FOR END-TO-END SPEECH RECOGNITION
    Bruguier, Antoine
    Le, Duc
    Prabhavalkar, Rohit
    Li, Dangna
    Liu, Zhe
    Wang, Bo
    Chang, Eun
    Peng, Fuchun
    Kalinli, Ozlem
    Seltzer, Michael L.
    [J]. 2022 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2022, : 6107 - 6111
  • [5] TOWARDS LANGUAGE-UNIVERSAL END-TO-END SPEECH RECOGNITION
    Kim, Suyoun
    Seltzer, Michael L.
    [J]. 2018 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2018, : 4914 - 4918
  • [6] End-to-End Large Vocabulary Speech Recognition for the Serbian Language
    Popovic, Branislav
    Pakoci, Edvin
    Pekar, Darko
    [J]. SPEECH AND COMPUTER, SPECOM 2017, 2017, 10458 : 343 - 352
  • [7] LEVERAGING LANGUAGE ID IN MULTILINGUAL END-TO-END SPEECH RECOGNITION
    Waters, Austin
    Gaur, Neeraj
    Haghani, Parisa
    Moreno, Pedro
    Qu, Zhongdi
    [J]. 2019 IEEE AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING WORKSHOP (ASRU 2019), 2019, : 928 - 935
  • [8] Noise Robust End-to-End Speech Recognition For Bangla Language
    Sumit, Sakhawat Hosain
    Al Muntasir, Tareq
    Zaman, M. M. Arefin
    Nandi, Rabindra Nath
    Sourov, Tanvir
    [J]. 2018 INTERNATIONAL CONFERENCE ON BANGLA SPEECH AND LANGUAGE PROCESSING (ICBSLP), 2018,
  • [9] Hybrid end-to-end model for Kazakh speech recognition
    Mamyrbayev O.Z.
    Oralbekova D.O.
    Alimhan K.
    Nuranbayeva B.M.
    [J]. International Journal of Speech Technology, 2023, 26 (2) : 261 - 270
  • [10] A SPELLING CORRECTION MODEL FOR END-TO-END SPEECH RECOGNITION
    Guo, Jinxi
    Sainath, Tara N.
    Weiss, Ron J.
    [J]. 2019 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2019, : 5651 - 5655