Word clustering based on similarity and vari-gram language model

被引:0
|
作者
Yuan, LC [1 ]
Zhong, YX [1 ]
机构
[1] Beijing Univ Posts & Telecommun, Coll Informat Engn, Beijing 100876, Peoples R China
关键词
word clustering; Statistical Language Model; vari-gram;
D O I
暂无
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Class based statistic language model is an important method to solve the problem of sparse,data. But there are two bottlenecks about this model: (1) The problem of word clustering, it is hard to find a suitable clustering method that has good performance and not large amount of computation. (2) Class based method always lose some prediction ability to adapt the text of different domain. The authors try to solve above problems in this paper. This paper presents a novel definition of word similarity. Based on word similarity, this paper gives the definition of word set similarity. Experiments show that word clustering algorithm based on similarity is better than conventional greedy clustering method in speed and performance. At the same time, this paper presents a new method to create the vari-gram model.
引用
收藏
页码:1222 / 1226
页数:5
相关论文
共 50 条
  • [41] Document Similarity Detection Using Indonesian Language Word2vec Model
    Ramadhanti, Nahda Rosa
    Mariyah, Siti
    2019 3RD INTERNATIONAL CONFERENCE ON INFORMATICS AND COMPUTATIONAL SCIENCES (ICICOS 2019), 2019,
  • [42] Word clustering with parallel spoken language corpora
    Wang, YY
    Lafferty, J
    Waibel, A
    ICSLP 96 - FOURTH INTERNATIONAL CONFERENCE ON SPOKEN LANGUAGE PROCESSING, PROCEEDINGS, VOLS 1-4, 1996, : 2364 - 2367
  • [43] An anomaly detection model of user behavior based on similarity clustering
    Hu, Shuai
    Xiao, Zhihua
    Rao, Qiang
    Liao, Rongtao
    PROCEEDINGS OF 2018 IEEE 4TH INFORMATION TECHNOLOGY AND MECHATRONICS ENGINEERING CONFERENCE (ITOEC 2018), 2018, : 835 - 838
  • [44] On the similarity measure of sample data for clustering based on the mixture model
    Fujita, O
    Baba, N
    KNOWLEDGE-BASED INTELLIGENT INFORMATION ENGINEERING SYSTEMS & ALLIED TECHNOLOGIES, PTS 1 AND 2, 2001, 69 : 451 - 455
  • [45] Word n-gram attention models for sentence similarity and inference
    Lopez-Gazpio, I
    Maritxalar, M.
    Lapata, M.
    Agirre, E.
    EXPERT SYSTEMS WITH APPLICATIONS, 2019, 132 : 1 - 11
  • [46] Document classification using n-gram and word semantic similarity
    Ren, Mei-Ying
    Kang, Sinjae
    International Journal of Future Generation Communication and Networking, 2015, 8 (08): : 111 - 118
  • [47] Language clustering with word co-occurrence networks based on parallel texts
    LIU HaiTao
    CONG Jin
    ChineseScienceBulletin, 2013, 58 (10) : 1139 - 1144
  • [48] A Bit Progress on Word-Based Language Model
    陈勇
    陈国评
    Journal of Shanghai University, 2003, (02) : 148 - 155
  • [49] Mongolian word segmentation based on statistical language model
    Hou, Hong-Xu
    Liu, Qun
    Nasanurtu
    Murengaowa
    Li, Jin-Tao
    Moshi Shibie yu Rengong Zhineng/Pattern Recognition and Artificial Intelligence, 2009, 22 (01): : 108 - 112
  • [50] Language clustering with word co-occurrence networks based on parallel texts
    Liu HaiTao
    Cong Jin
    CHINESE SCIENCE BULLETIN, 2013, 58 (10): : 1139 - 1144