Joint Unsupervised Adaptation of N-gram and RNN Language Models via LDA-based Hybrid Mixture Modeling

被引:0
|
作者
Masumura, Ryo [1 ]
Asami, Taichi [1 ]
Masataki, Hirokazu [1 ]
Aono, Yushi [1 ]
机构
[1] NTT Corp, NTT Media Intelligence Labs, Tokyo, Japan
关键词
D O I
暂无
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
This paper reports an initial study of unsupervised adaptation that assumes simultaneous use of both n-gram and recurrent neural network (RNN) language models (LMs) in automatic speech recognition (ASR). It is known that a combination of n-grams and RNN LMs is a more effective approach to ASR than using each of them singly. However, unsupervised adaptation methods that simultaneously adapt both n-grams and RNN LMs have not been presented while various unsupervised adaptation methods specific to either n-gram LMs or RNN LMs have been examined. In order to handle different LMs in a unified unsupervised adaptation framework, our key idea is to introduce mixture modeling for both n-gram LMs and RNN LMs. The mixture modeling can simultaneously handle multiple LMs and unsupervised adaptation can be easily accomplished merely by adjusting their mixture weights using a recognition hypothesis of an input speech. This paper proposes joint unsupervised adaptation achieved by a hybrid mixture modeling using both n-gram mixture models and RNN mixture models. We present latent Dirichlet allocation based hybrid mixture modeling for effective topic adaptation. Our experiments in lecture ASR tasks show the effectiveness of joint unsupervised adaptation. We also reveal performance in which only one n-gram or RNN LM is adapted.
引用
下载
收藏
页码:1538 / 1541
页数:4
相关论文
共 33 条
  • [31] Smoothed n-gram based models for tweet language identification: A case study of the Brazilian and European Portuguese national varieties
    Castro, Dayvid W.
    Souza, Ellen
    Vitorio, Douglas
    Santos, Diego
    Oliveira, Adriano L. I.
    APPLIED SOFT COMPUTING, 2017, 61 : 1160 - 1172
  • [32] Spontaneous speech understanding in train timetable inquiry processing based on N-gram language models and finite state transducers
    Jelínek, L
    Smídl, L
    8TH WORLD MULTI-CONFERENCE ON SYSTEMICS, CYBERNETICS AND INFORMATICS, VOL VI, PROCEEDINGS: IMAGE, ACOUSTIC, SIGNAL PROCESSING AND OPTICAL SYSTEMS, TECHNOLOGIES AND APPLICATIONS, 2004, : 444 - 449
  • [33] Unsupervised acquisition of idiomatic units of symbolic natural language: An n-gram frequency-based approach for the chunking of news articles and tweets (vol 15, e0234214, 2020)
    Borrelli, Dario
    Svartzman, Gabriela Gongora
    Lipizzi, Carlo
    PLOS ONE, 2021, 16 (01):