UNSUPERVISED LANGUAGE MODEL ADAPTATION USING N-GRAM WEIGHTING

被引:0
|
作者
Haidar, Md. Akmal [1 ]
O'Shaughnessy, Douglas [1 ]
机构
[1] INRS EMT, Montreal, PQ H5A 1K6, Canada
关键词
Mixture models; speech recognition; latent Dirichlet allocation; language model adaptation;
D O I
暂无
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
In this paper, we introduce the weighting of topic models in mixture language model adaptation using n-grams of the topic models. Topic clusters are formed by using a hard-clustering method assigning one topic to one document based on the maximum number of words chosen from a topic for that document in Latent Dirichlet Allocation (LDA) analysis. The n-grams of the topic generated by hard-clustering are used to compute the mixture weights of the component topic models. Instead of using all the words of the training vocabulary, selected words are used for LDA analysis, which are chosen by incorporating some information retrieval techniques. The proposed n-gram weighting approach shows significant reduction in perplexity and word error rate (WER) against a unigram weighting approach used in the literature.
引用
收藏
页码:857 / 860
页数:4
相关论文
共 50 条
  • [31] Multiclass composite N-gram language model based on connection direction
    Yamamoto, Hirofumi
    Sagisaka, Yoshinori
    Systems and Computers in Japan, 2003, 34 (07) : 108 - 114
  • [32] Topic-Dependent-Class-Based n-Gram Language Model
    Naptali, Welly
    Tsuchiya, Masatoshi
    Nakagawa, Seiichi
    IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2012, 20 (05): : 1513 - 1525
  • [33] Fast Neural Network Language Model Lookups at N-Gram Speeds
    Huang, Yinghui
    Sethy, Abhinav
    Ramabhadran, Bhuvana
    18TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2017), VOLS 1-6: SITUATED INTERACTION, 2017, : 274 - 278
  • [34] A Novel Interpolated N-gram Language Model Based on Class Hierarchy
    Lv, Zhenyu
    Liu, Wenju
    Yang, Zhanlei
    IEEE NLP-KE 2009: PROCEEDINGS OF INTERNATIONAL CONFERENCE ON NATURAL LANGUAGE PROCESSING AND KNOWLEDGE ENGINEERING, 2009, : 473 - 477
  • [35] Perplexity of n-Gram and Dependency Language Models
    Popel, Martin
    Marecek, David
    TEXT, SPEECH AND DIALOGUE, 2010, 6231 : 173 - 180
  • [36] MIXTURE OF MIXTURE N-GRAM LANGUAGE MODELS
    Sak, Hasim
    Allauzen, Cyril
    Nakajima, Kaisuke
    Beaufays, Francoise
    2013 IEEE WORKSHOP ON AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING (ASRU), 2013, : 31 - 36
  • [37] A variant of n-gram based language classification
    Tomovic, Andrija
    Janicic, Predrag
    AI(ASTERISK)IA 2007: ARTIFICIAL INTELLIGENCE AND HUMAN-ORIENTED COMPUTING, 2007, 4733 : 410 - +
  • [38] Pseudo-Conventional N-Gram Representation of the Discriminative N-Gram Model for LVCSR
    Zhou, Zhengyu
    Meng, Helen
    IEEE JOURNAL OF SELECTED TOPICS IN SIGNAL PROCESSING, 2010, 4 (06) : 943 - 952
  • [39] Pipilika N-gram Viewer: An Efficient Large Scale N-gram Model for Bengali
    Ahmad, Adnan
    Talha, Mahbubur Rub
    Amin, Md. Ruhul
    Chowdhury, Farida
    2018 INTERNATIONAL CONFERENCE ON BANGLA SPEECH AND LANGUAGE PROCESSING (ICBSLP), 2018,
  • [40] Comparison of web-based unsupervised translation disambiguation word model and N-gram model
    Institute of Computational Linguistics, Peking University, Beijing 100871, China
    不详
    Dianzi Yu Xinxi Xuebao, 2009, 12 (2969-2974):