UNSUPERVISED LANGUAGE MODEL ADAPTATION USING N-GRAM WEIGHTING

被引:0
|
作者
Haidar, Md. Akmal [1 ]
O'Shaughnessy, Douglas [1 ]
机构
[1] INRS EMT, Montreal, PQ H5A 1K6, Canada
关键词
Mixture models; speech recognition; latent Dirichlet allocation; language model adaptation;
D O I
暂无
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
In this paper, we introduce the weighting of topic models in mixture language model adaptation using n-grams of the topic models. Topic clusters are formed by using a hard-clustering method assigning one topic to one document based on the maximum number of words chosen from a topic for that document in Latent Dirichlet Allocation (LDA) analysis. The n-grams of the topic generated by hard-clustering are used to compute the mixture weights of the component topic models. Instead of using all the words of the training vocabulary, selected words are used for LDA analysis, which are chosen by incorporating some information retrieval techniques. The proposed n-gram weighting approach shows significant reduction in perplexity and word error rate (WER) against a unigram weighting approach used in the literature.
引用
收藏
页码:857 / 860
页数:4
相关论文
共 50 条
  • [1] A Corpus Based Unsupervised Bangla Word Stemming Using N-Gram Language Model
    Urmi, Tapashee Tabassum
    Jammy, Jasmine Jahan
    Ismail, Sabir
    2016 5TH INTERNATIONAL CONFERENCE ON INFORMATICS, ELECTRONICS AND VISION (ICIEV), 2016, : 824 - 828
  • [2] Bayesian estimation methods for N-gram language model adaptation
    Federico, M
    ICSLP 96 - FOURTH INTERNATIONAL CONFERENCE ON SPOKEN LANGUAGE PROCESSING, PROCEEDINGS, VOLS 1-4, 1996, : 240 - 243
  • [3] TOPIC N-GRAM COUNT LANGUAGE MODEL ADAPTATION FOR SPEECH RECOGNITION
    Haidar, Md. Akmal
    O'Shaughnessy, Douglas
    2012 IEEE WORKSHOP ON SPOKEN LANGUAGE TECHNOLOGY (SLT 2012), 2012, : 165 - 169
  • [4] Similar N-gram Language Model
    Gillot, Christian
    Cerisara, Christophe
    Langlois, David
    Haton, Jean-Paul
    11TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2010 (INTERSPEECH 2010), VOLS 3 AND 4, 2010, : 1824 - 1827
  • [5] Task adaptation using MAP estimation in N-gram language modeling
    Masataki, H
    Sagisaka, Y
    Hisaki, K
    Kawahara, T
    1997 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOLS I - V: VOL I: PLENARY, EXPERT SUMMARIES, SPECIAL, AUDIO, UNDERWATER ACOUSTICS, VLSI; VOL II: SPEECH PROCESSING; VOL III: SPEECH PROCESSING, DIGITAL SIGNAL PROCESSING; VOL IV: MULTIDIMENSIONAL SIGNAL PROCESSING, NEURAL NETWORKS - VOL V: STATISTICAL SIGNAL AND ARRAY PROCESSING, APPLICATIONS, 1997, : 783 - 786
  • [6] Efficient MDI Adaptation for n-gram Language Models
    Huang, Ruizhe
    Li, Ke
    Arora, Ashish
    Povey, Daniel
    Khudanpur, Sanjeev
    INTERSPEECH 2020, 2020, : 4916 - 4920
  • [7] Language model adaptation for fixed phrases by amplifying partial N-gram sequences
    Akiba, Tomoyosi
    Itou, Katunobu
    Fuji, Atsushi
    Systems and Computers in Japan, 2007, 38 (04): : 63 - 73
  • [8] A New Estimate of the n-gram Language Model
    Aouragh, Si Lhoussain
    Yousfi, Abdellah
    Laaroussi, Saida
    Gueddah, Hicham
    Nejja, Mohammed
    AI IN COMPUTATIONAL LINGUISTICS, 2021, 189 : 211 - 215
  • [9] Development of the N-gram Model for Azerbaijani Language
    Bannayeva, Aliya
    Aslanov, Mustafa
    2020 IEEE 14TH INTERNATIONAL CONFERENCE ON APPLICATION OF INFORMATION AND COMMUNICATION TECHNOLOGIES (AICT2020), 2020,
  • [10] IMPROVEMENTS TO N-GRAM LANGUAGE MODEL USING TEXT GENERATED FROM NEURAL LANGUAGE MODEL
    Suzuki, Masayuki
    Itoh, Nobuyasu
    Nagano, Tohru
    Kurata, Gakuto
    Thomas, Samuel
    2019 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2019, : 7245 - 7249