Vari-gram language model based on word clustering

被引：0

作者：

袁里驰

机构：

[1] School of Information Science and Engineering,Central South University

[2] School of Information Technology,Jiangxi University of Finance and Economics

来源：

Journal of Central South University | 2012年 / 19卷 / 04期

基金：

中国国家自然科学基金;

关键词：

word similarity; word clustering; statistical language model; vari-gram language model;

D O I：

暂无

中图分类号：

TP311.13 [];

学科分类号：

1201 ;

摘要：

Category-based statistic language model is an important method to solve the problem of sparse data.But there are two bottlenecks:1) The problem of word clustering.It is hard to find a suitable clustering method with good performance and less computation.2) Class-based method always loses the prediction ability to adapt the text in different domains.In order to solve above problems,a definition of word similarity by utilizing mutual information was presented.Based on word similarity,the definition of word set similarity was given.Experiments show that word clustering algorithm based on similarity is better than conventional greedy clustering method in speed and performance,and the perplexity is reduced from 283 to 218.At the same time,an absolute weighted difference method was presented and was used to construct vari-gram language model which has good prediction ability.The perplexity of vari-gram model is reduced from 234.65 to 219.14 on Chinese corpora,and is reduced from 195.56 to 184.25 on English corpora compared with category-based model.

引用

页码：1057 / 1062

页数：6

共 50 条

[1] Vari-gram language model based on word clustering
Li-chi Yuan
[J]. Journal of Central South University, 2012, 19 : 1057 - 1062
[2] Vari-gram language model based on word clustering
Yuan Li-chi
[J]. JOURNAL OF CENTRAL SOUTH UNIVERSITY, 2012, 19 (04) : 1057 - 1062
[3] Word clustering based on similarity and vari-gram language model
Yuan, LC
Zhong, YX
[J]. ICCC2004: Proceedings of the 16th International Conference on Computer Communication Vol 1and 2, 2004, : 1222 - 1226
[4] Vari-gram Language Model Based On Category
Yuan, Lichi
[J]. INFORMATION TECHNOLOGY FOR MANUFACTURING SYSTEMS II, PTS 1-3, 2011, 58-60 : 995 - 1000
[5] Bangla Word Clustering Based on N-gram Language Model
Ismail, Sabir
Rahman, M. Shahidur
[J]. 2014 1ST INTERNATIONAL CONFERENCE ON ELECTRICAL ENGINEERING AND INFORMATION & COMMUNICATION TECHNOLOGY (ICEEICT 2014), 2014,
[6] Language model based on word clustering
Yuan, Lichi
[J]. PACLIC 20: Proceedings of the 20th Pacific Asia Conference on Language, Information and Computation, 2006, : 394 - 397
[7] A Framework for Word Clustering of Bangla Sentences Using Higher Order N-gram Language Model
Husna, Asmaul
Mostofa, Maliha
Khatun, Ayesha
Islam, Jahidul
Mahin, Md.
[J]. 2018 INTERNATIONAL CONFERENCE ON INNOVATION IN ENGINEERING AND TECHNOLOGY (ICIET), 2018,
[8] An N-gram based model for predicting of word-formation in Assamese language
Bhuyan, M. P.
Sarma, S. K.
[J]. JOURNAL OF INFORMATION & OPTIMIZATION SCIENCES, 2019, 40 (02): : 427 - 440
[9] RNN language model with word clustering and class-based output layer
Yongzhe Shi
Wei-Qiang Zhang
Jia Liu
Michael T Johnson
[J]. EURASIP Journal on Audio, Speech, and Music Processing, 2013
[10] RNN language model with word clustering and class-based output layer
Shi, Yongzhe
Zhang, Wei-Qiang
Liu, Jia
Johnson, Michael T.
[J]. EURASIP JOURNAL ON AUDIO SPEECH AND MUSIC PROCESSING, 2013,

← 1 2 3 4 5 →