Vari-gram language model based on word clustering

被引:0
|
作者
袁里驰
机构
[1] School of Information Science and Engineering,Central South University
[2] School of Information Technology,Jiangxi University of Finance and Economics
基金
中国国家自然科学基金;
关键词
word similarity; word clustering; statistical language model; vari-gram language model;
D O I
暂无
中图分类号
TP311.13 [];
学科分类号
1201 ;
摘要
Category-based statistic language model is an important method to solve the problem of sparse data.But there are two bottlenecks:1) The problem of word clustering.It is hard to find a suitable clustering method with good performance and less computation.2) Class-based method always loses the prediction ability to adapt the text in different domains.In order to solve above problems,a definition of word similarity by utilizing mutual information was presented.Based on word similarity,the definition of word set similarity was given.Experiments show that word clustering algorithm based on similarity is better than conventional greedy clustering method in speed and performance,and the perplexity is reduced from 283 to 218.At the same time,an absolute weighted difference method was presented and was used to construct vari-gram language model which has good prediction ability.The perplexity of vari-gram model is reduced from 234.65 to 219.14 on Chinese corpora,and is reduced from 195.56 to 184.25 on English corpora compared with category-based model.
引用
收藏
页码:1057 / 1062
页数:6
相关论文
共 50 条
  • [1] Vari-gram language model based on word clustering
    Li-chi Yuan
    [J]. Journal of Central South University, 2012, 19 : 1057 - 1062
  • [2] Vari-gram language model based on word clustering
    Yuan Li-chi
    [J]. JOURNAL OF CENTRAL SOUTH UNIVERSITY, 2012, 19 (04) : 1057 - 1062
  • [3] Word clustering based on similarity and vari-gram language model
    Yuan, LC
    Zhong, YX
    [J]. ICCC2004: Proceedings of the 16th International Conference on Computer Communication Vol 1and 2, 2004, : 1222 - 1226
  • [4] Vari-gram Language Model Based On Category
    Yuan, Lichi
    [J]. INFORMATION TECHNOLOGY FOR MANUFACTURING SYSTEMS II, PTS 1-3, 2011, 58-60 : 995 - 1000
  • [5] Bangla Word Clustering Based on N-gram Language Model
    Ismail, Sabir
    Rahman, M. Shahidur
    [J]. 2014 1ST INTERNATIONAL CONFERENCE ON ELECTRICAL ENGINEERING AND INFORMATION & COMMUNICATION TECHNOLOGY (ICEEICT 2014), 2014,
  • [6] Language model based on word clustering
    Yuan, Lichi
    [J]. PACLIC 20: Proceedings of the 20th Pacific Asia Conference on Language, Information and Computation, 2006, : 394 - 397
  • [7] A Framework for Word Clustering of Bangla Sentences Using Higher Order N-gram Language Model
    Husna, Asmaul
    Mostofa, Maliha
    Khatun, Ayesha
    Islam, Jahidul
    Mahin, Md.
    [J]. 2018 INTERNATIONAL CONFERENCE ON INNOVATION IN ENGINEERING AND TECHNOLOGY (ICIET), 2018,
  • [8] An N-gram based model for predicting of word-formation in Assamese language
    Bhuyan, M. P.
    Sarma, S. K.
    [J]. JOURNAL OF INFORMATION & OPTIMIZATION SCIENCES, 2019, 40 (02): : 427 - 440
  • [9] RNN language model with word clustering and class-based output layer
    Yongzhe Shi
    Wei-Qiang Zhang
    Jia Liu
    Michael T Johnson
    [J]. EURASIP Journal on Audio, Speech, and Music Processing, 2013
  • [10] RNN language model with word clustering and class-based output layer
    Shi, Yongzhe
    Zhang, Wei-Qiang
    Liu, Jia
    Johnson, Michael T.
    [J]. EURASIP JOURNAL ON AUDIO SPEECH AND MUSIC PROCESSING, 2013,