Word clustering based on similarity and vari-gram language model

被引:0
|
作者
Yuan, LC [1 ]
Zhong, YX [1 ]
机构
[1] Beijing Univ Posts & Telecommun, Coll Informat Engn, Beijing 100876, Peoples R China
关键词
word clustering; Statistical Language Model; vari-gram;
D O I
暂无
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Class based statistic language model is an important method to solve the problem of sparse,data. But there are two bottlenecks about this model: (1) The problem of word clustering, it is hard to find a suitable clustering method that has good performance and not large amount of computation. (2) Class based method always lose some prediction ability to adapt the text of different domain. The authors try to solve above problems in this paper. This paper presents a novel definition of word similarity. Based on word similarity, this paper gives the definition of word set similarity. Experiments show that word clustering algorithm based on similarity is better than conventional greedy clustering method in speed and performance. At the same time, this paper presents a new method to create the vari-gram model.
引用
收藏
页码:1222 / 1226
页数:5
相关论文
共 50 条
  • [1] Vari-gram language model based on word clustering
    Li-chi Yuan
    [J]. Journal of Central South University, 2012, 19 : 1057 - 1062
  • [2] Vari-gram language model based on word clustering
    Yuan Li-chi
    [J]. JOURNAL OF CENTRAL SOUTH UNIVERSITY, 2012, 19 (04) : 1057 - 1062
  • [3] Vari-gram language model based on word clustering
    袁里驰
    [J]. Journal of Central South University, 2012, 19 (04) : 1057 - 1062
  • [4] Vari-gram Language Model Based On Category
    Yuan, Lichi
    [J]. INFORMATION TECHNOLOGY FOR MANUFACTURING SYSTEMS II, PTS 1-3, 2011, 58-60 : 995 - 1000
  • [5] Bangla Word Clustering Based on N-gram Language Model
    Ismail, Sabir
    Rahman, M. Shahidur
    [J]. 2014 1ST INTERNATIONAL CONFERENCE ON ELECTRICAL ENGINEERING AND INFORMATION & COMMUNICATION TECHNOLOGY (ICEEICT 2014), 2014,
  • [6] Language model based on word clustering
    Yuan, Lichi
    [J]. PACLIC 20: Proceedings of the 20th Pacific Asia Conference on Language, Information and Computation, 2006, : 394 - 397
  • [7] Clustering words for statistical language models based on contextual word similarity
    Farhat, A
    Isabelle, JF
    OShaughnessy, D
    [J]. 1996 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, CONFERENCE PROCEEDINGS, VOLS 1-6, 1996, : 180 - 183
  • [8] Word Clustering Algorithms Based on Word Similarity
    Yuan, Lichi
    [J]. 2015 7TH INTERNATIONAL CONFERENCE ON INTELLIGENT HUMAN-MACHINE SYSTEMS AND CYBERNETICS IHMSC 2015, VOL I, 2015, : 21 - 24
  • [9] A New Word Clustering Algorithm Based on Word Similarity
    YUAN Lichi
    [J]. Chinese Journal of Electronics, 2017, 26 (06) : 1221 - 1226
  • [10] A New Word Clustering Algorithm Based on Word Similarity
    Yuan Lichi
    [J]. CHINESE JOURNAL OF ELECTRONICS, 2017, 26 (06) : 1221 - 1226