Joint Unsupervised Adaptation of N-gram and RNN Language Models via LDA-based Hybrid Mixture Modeling

被引:0
|
作者
Masumura, Ryo [1 ]
Asami, Taichi [1 ]
Masataki, Hirokazu [1 ]
Aono, Yushi [1 ]
机构
[1] NTT Corp, NTT Media Intelligence Labs, Tokyo, Japan
关键词
D O I
暂无
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
This paper reports an initial study of unsupervised adaptation that assumes simultaneous use of both n-gram and recurrent neural network (RNN) language models (LMs) in automatic speech recognition (ASR). It is known that a combination of n-grams and RNN LMs is a more effective approach to ASR than using each of them singly. However, unsupervised adaptation methods that simultaneously adapt both n-grams and RNN LMs have not been presented while various unsupervised adaptation methods specific to either n-gram LMs or RNN LMs have been examined. In order to handle different LMs in a unified unsupervised adaptation framework, our key idea is to introduce mixture modeling for both n-gram LMs and RNN LMs. The mixture modeling can simultaneously handle multiple LMs and unsupervised adaptation can be easily accomplished merely by adjusting their mixture weights using a recognition hypothesis of an input speech. This paper proposes joint unsupervised adaptation achieved by a hybrid mixture modeling using both n-gram mixture models and RNN mixture models. We present latent Dirichlet allocation based hybrid mixture modeling for effective topic adaptation. Our experiments in lecture ASR tasks show the effectiveness of joint unsupervised adaptation. We also reveal performance in which only one n-gram or RNN LM is adapted.
引用
下载
收藏
页码:1538 / 1541
页数:4
相关论文
共 33 条
  • [1] Unsupervised language model adaptation using LDA-based mixture models and latent semantic marginals
    Haidar, Md. Akmal
    O'Shaughnessy, Douglas
    COMPUTER SPEECH AND LANGUAGE, 2015, 29 (01): : 20 - 31
  • [2] MIXTURE OF MIXTURE N-GRAM LANGUAGE MODELS
    Sak, Hasim
    Allauzen, Cyril
    Nakajima, Kaisuke
    Beaufays, Francoise
    2013 IEEE WORKSHOP ON AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING (ASRU), 2013, : 31 - 36
  • [3] Character n-Gram Embeddings to Improve RNN Language Models
    Takase, Sho
    Suzuki, Jun
    Nagata, Masaaki
    THIRTY-THIRD AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE / THIRTY-FIRST INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE CONFERENCE / NINTH AAAI SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2019, : 5074 - 5082
  • [4] UNSUPERVISED LANGUAGE MODEL ADAPTATION USING N-GRAM WEIGHTING
    Haidar, Md. Akmal
    O'Shaughnessy, Douglas
    2011 24TH CANADIAN CONFERENCE ON ELECTRICAL AND COMPUTER ENGINEERING (CCECE), 2011, : 857 - 860
  • [5] Efficient MDI Adaptation for n-gram Language Models
    Huang, Ruizhe
    Li, Ke
    Arora, Ashish
    Povey, Daniel
    Khudanpur, Sanjeev
    INTERSPEECH 2020, 2020, : 4916 - 4920
  • [6] Profile based compression of n-gram language models
    Olsen, Jesper
    Oria, Daniela
    2006 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING, VOLS 1-13, 2006, : 1041 - 1044
  • [7] Modeling actions of PubMed users with n-gram language models
    Lin, Jimmy
    Wilbur, W. John
    INFORMATION RETRIEVAL, 2009, 12 (04): : 487 - 503
  • [8] Modeling actions of PubMed users with n-gram language models
    Jimmy Lin
    W. John Wilbur
    Information Retrieval, 2009, 12 : 487 - 503
  • [9] Task adaptation using MAP estimation in N-gram language modeling
    Masataki, H
    Sagisaka, Y
    Hisaki, K
    Kawahara, T
    1997 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOLS I - V: VOL I: PLENARY, EXPERT SUMMARIES, SPECIAL, AUDIO, UNDERWATER ACOUSTICS, VLSI; VOL II: SPEECH PROCESSING; VOL III: SPEECH PROCESSING, DIGITAL SIGNAL PROCESSING; VOL IV: MULTIDIMENSIONAL SIGNAL PROCESSING, NEURAL NETWORKS - VOL V: STATISTICAL SIGNAL AND ARRAY PROCESSING, APPLICATIONS, 1997, : 783 - 786
  • [10] Rich Morphology Based N-gram Language Models for Arabic
    Emami, Ahmad
    Zitouni, Imed
    Mangu, Lidia
    INTERSPEECH 2008: 9TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2008, VOLS 1-5, 2008, : 829 - 832