Word clustering based on similarity and vari-gram language model

被引:0
|
作者
Yuan, LC [1 ]
Zhong, YX [1 ]
机构
[1] Beijing Univ Posts & Telecommun, Coll Informat Engn, Beijing 100876, Peoples R China
关键词
word clustering; Statistical Language Model; vari-gram;
D O I
暂无
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Class based statistic language model is an important method to solve the problem of sparse,data. But there are two bottlenecks about this model: (1) The problem of word clustering, it is hard to find a suitable clustering method that has good performance and not large amount of computation. (2) Class based method always lose some prediction ability to adapt the text of different domain. The authors try to solve above problems in this paper. This paper presents a novel definition of word similarity. Based on word similarity, this paper gives the definition of word set similarity. Experiments show that word clustering algorithm based on similarity is better than conventional greedy clustering method in speed and performance. At the same time, this paper presents a new method to create the vari-gram model.
引用
收藏
页码:1222 / 1226
页数:5
相关论文
共 50 条
  • [31] New word clustering method for building n-gram language models in continuous speech recognition systems
    Bahrani, Mohammad
    Sameti, Hossein
    Hafezi, Nazila
    Momtazi, Saeedeh
    NEW FRONTIERS IN APPLIED ARTIFICIAL INTELLIGENCE, 2008, 5027 : 286 - 293
  • [32] A clustering-based topic model using word networks and word embeddings
    Mu, Wenchuan
    Lim, Kwan Hui
    Liu, Junhua
    Karunasekera, Shanika
    Falzon, Lucia
    Harwood, Aaron
    JOURNAL OF BIG DATA, 2022, 9 (01)
  • [33] Short Text Clustering based on Word Semantic Graph with Word Embedding Model
    Jinarat, Supakpong
    Manaskasemsak, Bundit
    Rungsawang, Arnon
    2018 JOINT 10TH INTERNATIONAL CONFERENCE ON SOFT COMPUTING AND INTELLIGENT SYSTEMS (SCIS) AND 19TH INTERNATIONAL SYMPOSIUM ON ADVANCED INTELLIGENT SYSTEMS (ISIS), 2018, : 1427 - 1432
  • [34] A clustering-based topic model using word networks and word embeddings
    Wenchuan Mu
    Kwan Hui Lim
    Junhua Liu
    Shanika Karunasekera
    Lucia Falzon
    Aaron Harwood
    Journal of Big Data, 9
  • [35] Semantic clustering based relevance language model
    Pu Q.
    He D.
    Information Technology Journal, 2010, 9 (02) : 236 - 246
  • [36] Word Similarity Based Model for Tweet Stream Prospective Notification
    Chellal, Abdelhamid
    Boughanem, Mohand
    Dousset, Bernard
    ADVANCES IN INFORMATION RETRIEVAL, ECIR 2017, 2017, 10193 : 655 - 661
  • [37] Measuring Word Similarity Based on Pattern Vector Space Model
    Liu, Lei
    Zhong, Maoshang
    Lu, Ruzhan
    2009 INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND COMPUTATIONAL INTELLIGENCE, VOL III, PROCEEDINGS, 2009, : 72 - +
  • [38] Similarity language model
    Gillot, Christian
    Cerisara, Christophe
    12TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2011 (INTERSPEECH 2011), VOLS 1-5, 2011, : 1468 - 1471
  • [39] A mathematical model of similarity and clustering
    Sun, FS
    Tzeng, CH
    ITCC 2004: INTERNATIONAL CONFERENCE ON INFORMATION TECHNOLOGY: CODING AND COMPUTING, VOL 1, PROCEEDINGS, 2004, : 460 - 464
  • [40] Comparing neural- and N-gram-based language models for word segmentation
    Doval, Yerai
    Gomez-Rodriguez, Carlos
    JOURNAL OF THE ASSOCIATION FOR INFORMATION SCIENCE AND TECHNOLOGY, 2019, 70 (02) : 187 - 197