UNSUPERVISED LANGUAGE MODEL ADAPTATION USING N-GRAM WEIGHTING

被引：0

作者：

Haidar, Md. Akmal ^{[1
]}

O'Shaughnessy, Douglas ^{[1
]}

机构：

[1] INRS EMT, Montreal, PQ H5A 1K6, Canada

来源：

2011 24TH CANADIAN CONFERENCE ON ELECTRICAL AND COMPUTER ENGINEERING (CCECE) | 2011年

关键词：

Mixture models; speech recognition; latent Dirichlet allocation; language model adaptation;

D O I：

暂无

中图分类号：

TP301 [理论、方法];

学科分类号：

081202 ;

摘要：

In this paper, we introduce the weighting of topic models in mixture language model adaptation using n-grams of the topic models. Topic clusters are formed by using a hard-clustering method assigning one topic to one document based on the maximum number of words chosen from a topic for that document in Latent Dirichlet Allocation (LDA) analysis. The n-grams of the topic generated by hard-clustering are used to compute the mixture weights of the component topic models. Instead of using all the words of the training vocabulary, selected words are used for LDA analysis, which are chosen by incorporating some information retrieval techniques. The proposed n-gram weighting approach shows significant reduction in perplexity and word error rate (WER) against a unigram weighting approach used in the literature.

引用

页码：857 / 860

页数：4

共 50 条

[1] A Corpus Based Unsupervised Bangla Word Stemming Using N-Gram Language Model
Urmi, Tapashee Tabassum
Jammy, Jasmine Jahan
Ismail, Sabir
2016 5TH INTERNATIONAL CONFERENCE ON INFORMATICS, ELECTRONICS AND VISION (ICIEV), 2016, : 824 - 828
[2] Bayesian estimation methods for N-gram language model adaptation
Federico, M
ICSLP 96 - FOURTH INTERNATIONAL CONFERENCE ON SPOKEN LANGUAGE PROCESSING, PROCEEDINGS, VOLS 1-4, 1996, : 240 - 243
[3] TOPIC N-GRAM COUNT LANGUAGE MODEL ADAPTATION FOR SPEECH RECOGNITION
Haidar, Md. Akmal
O'Shaughnessy, Douglas
2012 IEEE WORKSHOP ON SPOKEN LANGUAGE TECHNOLOGY (SLT 2012), 2012, : 165 - 169
[4] Similar N-gram Language Model
Gillot, Christian
Cerisara, Christophe
Langlois, David
Haton, Jean-Paul
11TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2010 (INTERSPEECH 2010), VOLS 3 AND 4, 2010, : 1824 - 1827
[5] Task adaptation using MAP estimation in N-gram language modeling
Masataki, H
Sagisaka, Y
Hisaki, K
Kawahara, T
1997 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOLS I - V: VOL I: PLENARY, EXPERT SUMMARIES, SPECIAL, AUDIO, UNDERWATER ACOUSTICS, VLSI; VOL II: SPEECH PROCESSING; VOL III: SPEECH PROCESSING, DIGITAL SIGNAL PROCESSING; VOL IV: MULTIDIMENSIONAL SIGNAL PROCESSING, NEURAL NETWORKS - VOL V: STATISTICAL SIGNAL AND ARRAY PROCESSING, APPLICATIONS, 1997, : 783 - 786
[6] Efficient MDI Adaptation for n-gram Language Models
Huang, Ruizhe
Li, Ke
Arora, Ashish
Povey, Daniel
Khudanpur, Sanjeev
INTERSPEECH 2020, 2020, : 4916 - 4920
[7] Language model adaptation for fixed phrases by amplifying partial N-gram sequences
Akiba, Tomoyosi
Itou, Katunobu
Fuji, Atsushi
Systems and Computers in Japan, 2007, 38 (04): : 63 - 73
[8] A New Estimate of the n-gram Language Model
Aouragh, Si Lhoussain
Yousfi, Abdellah
Laaroussi, Saida
Gueddah, Hicham
Nejja, Mohammed
AI IN COMPUTATIONAL LINGUISTICS, 2021, 189 : 211 - 215
[9] Development of the N-gram Model for Azerbaijani Language
Bannayeva, Aliya
Aslanov, Mustafa
2020 IEEE 14TH INTERNATIONAL CONFERENCE ON APPLICATION OF INFORMATION AND COMMUNICATION TECHNOLOGIES (AICT2020), 2020,
[10] IMPROVEMENTS TO N-GRAM LANGUAGE MODEL USING TEXT GENERATED FROM NEURAL LANGUAGE MODEL
Suzuki, Masayuki
Itoh, Nobuyasu
Nagano, Tohru
Kurata, Gakuto
Thomas, Samuel
2019 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2019, : 7245 - 7249

← 1 2 3 4 5 →