UNSUPERVISED LANGUAGE MODEL ADAPTATION USING N-GRAM WEIGHTING

被引:0
|
作者
Haidar, Md. Akmal [1 ]
O'Shaughnessy, Douglas [1 ]
机构
[1] INRS EMT, Montreal, PQ H5A 1K6, Canada
关键词
Mixture models; speech recognition; latent Dirichlet allocation; language model adaptation;
D O I
暂无
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
In this paper, we introduce the weighting of topic models in mixture language model adaptation using n-grams of the topic models. Topic clusters are formed by using a hard-clustering method assigning one topic to one document based on the maximum number of words chosen from a topic for that document in Latent Dirichlet Allocation (LDA) analysis. The n-grams of the topic generated by hard-clustering are used to compute the mixture weights of the component topic models. Instead of using all the words of the training vocabulary, selected words are used for LDA analysis, which are chosen by incorporating some information retrieval techniques. The proposed n-gram weighting approach shows significant reduction in perplexity and word error rate (WER) against a unigram weighting approach used in the literature.
引用
收藏
页码:857 / 860
页数:4
相关论文
共 50 条
  • [41] Discriminative N-gram Language Modeling for Turkish
    Arisoy, Ebru
    Roark, Brian
    Shafran, Izhak
    Saraclar, Murat
    INTERSPEECH 2008: 9TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2008, VOLS 1-5, 2008, : 825 - +
  • [42] Splitting input for machine translation using N-gram language model together with utterance similarity
    Doi, T
    Sumita, E
    IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, 2005, E88D (06): : 1256 - 1264
  • [43] A Framework for Word Clustering of Bangla Sentences Using Higher Order N-gram Language Model
    Husna, Asmaul
    Mostofa, Maliha
    Khatun, Ayesha
    Islam, Jahidul
    Mahin, Md.
    2018 INTERNATIONAL CONFERENCE ON INNOVATION IN ENGINEERING AND TECHNOLOGY (ICIET), 2018,
  • [44] Forward-backward Transliteration of Punjabi Gurmukhi Script Using N-gram Language Model
    Goyal, Kapil Dev
    Abbas, Muhammad Raihan
    Goyal, Vishal
    Saleem, Yasir
    ACM TRANSACTIONS ON ASIAN AND LOW-RESOURCE LANGUAGE INFORMATION PROCESSING, 2023, 22 (02)
  • [45] Stepwise API usage assistance using n-gram language models
    Santos, Andre L.
    Prendi, Goncalo
    Sousa, Hugo
    Ribeiro, Ricardo
    JOURNAL OF SYSTEMS AND SOFTWARE, 2017, 131 : 461 - 474
  • [46] Supervised N-gram Topic Model
    Kawamae, Noriaki
    WSDM'14: PROCEEDINGS OF THE 7TH ACM INTERNATIONAL CONFERENCE ON WEB SEARCH AND DATA MINING, 2014, : 473 - 482
  • [47] N-gram adaptation with dynamic interpolation coefficient using information retrieval technique
    Choi, Joon-Ki
    Oh, Yung-Hwan
    IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, 2006, E89D (09): : 2579 - 2582
  • [48] Automatic Chinese Text Classification Using N-Gram Model
    Yen, Show-Jane
    Lee, Yue-Shi
    Wu, Yu-Chieh
    Ying, Jia-Ching
    Tseng, Vincent S.
    COMPUTATIONAL SCIENCE AND ITS APPLICATIONS - ICCSA 2010, PT 3, PROCEEDINGS, 2010, 6018 : 458 - +
  • [49] Product Reviews based on Location using N-gram model
    Varma, Kajal S.
    Mahajan, Arpana
    Degadwala, Sheshang D.
    PROCEEDINGS OF THE 2018 3RD INTERNATIONAL CONFERENCE ON INVENTIVE COMPUTATION TECHNOLOGIES (ICICT 2018), 2018, : 100 - 104
  • [50] Pathway Prediction Using Similar Users and the N-gram Model
    Kawase, Kanta
    Thawonmas, Ruck
    2013 INTERNATIONAL JOINT CONFERENCE ON AWARENESS SCIENCE AND TECHNOLOGY & UBI-MEDIA COMPUTING (ICAST-UMEDIA), 2013, : 131 - 136