Residual Language Model for End-to-end Speech Recognition

被引:3
|
作者
Tsunoo, Emiru [1 ]
Kashiwagi, Yosuke [1 ]
Narisetty, Chaitanya [2 ]
Watanabe, Shinji [2 ]
机构
[1] Sony Grp Corp, Tokyo, Japan
[2] Carnegie Mellon Univ, Pittsburgh, PA 15213 USA
来源
关键词
speech recognition; language model; attention-based encoder-decoder; internal language model estimation;
D O I
10.21437/Interspeech.2022-10557
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
End-to-end automatic speech recognition suffers from adaptation to unknown target domain speech despite being trained with a large amount of paired audio-text data. Recent studies estimate a linguistic bias of the model as the internal language model (LM). To effectively adapt to the target domain, the internal LM is subtracted from the posterior during inference and fused with an external target-domain LM. However, this fusion complicates the inference and the estimation of the internal LM may not always be accurate. In this paper, we propose a simple external LM fusion method for domain adaptation, which considers the internal LM estimation in its training. We directly model the residual factor of the external and internal LMs, namely the residual LM. To stably train the residual LM, we propose smoothing the estimated internal LM and optimizing it with a combination of cross-entropy and mean-squared-error losses, which consider the statistical behaviors of the internal LM in the target domain data. We experimentally confirmed that the proposed residual LM performs better than the internal LM estimation in most of the cross-domain and intra-domain scenarios.
引用
收藏
页码:3899 / 3903
页数:5
相关论文
共 50 条
  • [31] Internal Language Model Adaptation with Text-Only Data for End-to-End Speech Recognition
    Meng, Zhong
    Gaur, Yashesh
    Kanda, Naoyuki
    Li, Jinyu
    Chen, Xie
    Wu, Yu
    Gong, Yifan
    [J]. INTERSPEECH 2022, 2022, : 2608 - 2612
  • [32] EXTENDING PARROTRON: AN END-TO-END, SPEECH CONVERSION AND SPEECH RECOGNITION MODEL FOR ATYPICAL SPEECH
    Doshi, Rohan
    Chen, Youzheng
    Jiang, Liyang
    Zhang, Xia
    Biadsy, Fadi
    Ramabhadran, Bhuvana
    Chu, Fang
    Rosenberg, Andrew
    Moreno, Pedro J.
    [J]. 2021 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP 2021), 2021, : 6988 - 6992
  • [33] IMPROVING UNSUPERVISED STYLE TRANSFER IN END-TO-END SPEECH SYNTHESIS WITH END-TO-END SPEECH RECOGNITION
    Liu, Da-Rong
    Yang, Chi-Yu
    Wu, Szu-Lin
    Lee, Hung-Yi
    [J]. 2018 IEEE WORKSHOP ON SPOKEN LANGUAGE TECHNOLOGY (SLT 2018), 2018, : 640 - 647
  • [34] End-to-End Deep Learning Speech Recognition Model for Silent Speech Challenge
    Kimura, Naoki
    Su, Zixiong
    Saeki, Takaaki
    [J]. INTERSPEECH 2020, 2020, : 1025 - 1026
  • [35] END-TO-END TRAINING OF A LARGE VOCABULARY END-TO-END SPEECH RECOGNITION SYSTEM
    Kim, Chanwoo
    Kim, Sungsoo
    Kim, Kwangyoun
    Kumar, Mehul
    Kim, Jiyeon
    Lee, Kyungmin
    Han, Changwoo
    Garg, Abhinav
    Kim, Eunhyang
    Shin, Minkyoo
    Singh, Shatrughan
    Heck, Larry
    Gowda, Dhananjaya
    [J]. 2019 IEEE AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING WORKSHOP (ASRU 2019), 2019, : 562 - 569
  • [36] END-TO-END SPEECH RECOGNITION WITH WORD-BASED RNN LANGUAGE MODELS
    Hori, Takaaki
    Cho, Jaejin
    Watanabe, Shinji
    [J]. 2018 IEEE WORKSHOP ON SPOKEN LANGUAGE TECHNOLOGY (SLT 2018), 2018, : 389 - 396
  • [37] Location-Based End-to-End Speech Recognition with Multiple Language Models
    Lin, Zhijie
    Lin, Kaiyang
    Chen, Shiling
    Li, Linlin
    Zhao, Zhou
    [J]. THIRTY-THIRD AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE / THIRTY-FIRST INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE CONFERENCE / NINTH AAAI SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2019, : 9975 - 9976
  • [38] End-To-End deep neural models for Automatic Speech Recognition for Polish Language
    Pondel-Sycz, Karolina
    Pietrzak, Agnieszka Paula
    Szymla, Julia
    [J]. INTERNATIONAL JOURNAL OF ELECTRONICS AND TELECOMMUNICATIONS, 2024, 70 (02) : 315 - 321
  • [39] END-TO-END VISUAL SPEECH RECOGNITION WITH LSTMS
    Petridis, Stavros
    Li, Zuwei
    Pantic, Maja
    [J]. 2017 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2017, : 2592 - 2596
  • [40] Review of End-to-End Streaming Speech Recognition
    Wang, Aohui
    Zhang, Long
    Song, Wenyu
    Meng, Jie
    [J]. Computer Engineering and Applications, 1600, 2 (22-33):