UNSUPERVISED LANGUAGE MODEL ADAPTATION USING N-GRAM WEIGHTING

被引:0
|
作者
Haidar, Md. Akmal [1 ]
O'Shaughnessy, Douglas [1 ]
机构
[1] INRS EMT, Montreal, PQ H5A 1K6, Canada
关键词
Mixture models; speech recognition; latent Dirichlet allocation; language model adaptation;
D O I
暂无
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
In this paper, we introduce the weighting of topic models in mixture language model adaptation using n-grams of the topic models. Topic clusters are formed by using a hard-clustering method assigning one topic to one document based on the maximum number of words chosen from a topic for that document in Latent Dirichlet Allocation (LDA) analysis. The n-grams of the topic generated by hard-clustering are used to compute the mixture weights of the component topic models. Instead of using all the words of the training vocabulary, selected words are used for LDA analysis, which are chosen by incorporating some information retrieval techniques. The proposed n-gram weighting approach shows significant reduction in perplexity and word error rate (WER) against a unigram weighting approach used in the literature.
引用
收藏
页码:857 / 860
页数:4
相关论文
共 50 条
  • [21] Discriminative n-gram language modeling
    Roark, Brian
    Saraclar, Murat
    Collins, Michael
    COMPUTER SPEECH AND LANGUAGE, 2007, 21 (02): : 373 - 392
  • [22] Croatian Language N-Gram System
    Dembitz, Sandor
    Blaskovic, Bruno
    Gledec, Gordan
    ADVANCES IN KNOWLEDGE-BASED AND INTELLIGENT INFORMATION AND ENGINEERING SYSTEMS, 2012, 243 : 696 - 705
  • [23] A WEIGHTED AVERAGE N-GRAM MODEL OF NATURAL-LANGUAGE
    OBOYLE, P
    OWENS, M
    SMITH, FJ
    COMPUTER SPEECH AND LANGUAGE, 1994, 8 (04): : 337 - 349
  • [24] Multi-class composite N-gram language model
    Yamamoto, H
    Isogai, S
    Sagisaka, Y
    SPEECH COMMUNICATION, 2003, 41 (2-3) : 369 - 379
  • [25] Fast language model look-ahead algorithm using extended N-gram model
    Shan, Yu-Xiang
    Chen, Xie
    Shi, Yong-Zhe
    Liu, Jia
    Zidonghua Xuebao/Acta Automatica Sinica, 2012, 38 (10): : 1618 - 1626
  • [26] Syllabification Model of Indonesian Language Named-Entity Using Syntactic n-Gram
    Fanani, Ahmad Muammar
    Suyanto, Suyanto
    5TH INTERNATIONAL CONFERENCE ON COMPUTER SCIENCE AND COMPUTATIONAL INTELLIGENCE 2020, 2021, 179 : 721 - 727
  • [27] Unsupervised word sense disambiguation with N-gram features
    Daniel Preotiuc-Pietro
    Florentina Hristea
    Artificial Intelligence Review, 2014, 41 : 241 - 260
  • [28] Unsupervised word sense disambiguation with N-gram features
    Preotiuc-Pietro, Daniel
    Hristea, Florentina
    ARTIFICIAL INTELLIGENCE REVIEW, 2014, 41 (02) : 241 - 260
  • [29] Recasting the discriminative n-gram model as a pseudo-conventional n-gram model for LVCSR
    Zhou, Zhengyu
    Meng, Helen
    2008 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING, VOLS 1-12, 2008, : 4933 - 4936
  • [30] Combination of Random Indexing based Language Model and N-gram Language Model for Speech Recognition
    Fohr, Dominique
    Mella, Odile
    14TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2013), VOLS 1-5, 2013, : 2231 - 2235