Word Clustering Algorithms Based on Word Similarity

被引:2
|
作者
Yuan, Lichi [1 ]
机构
[1] Jiangxi Univ Finance & Econ, Sch Informat Technol, Nanchang 330013, Peoples R China
关键词
Word similarity; Word clustering; Statistical language model;
D O I
10.1109/IHMSC.2015.36
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Category-based statistical language model is an important method to solve the problem of sparse data, but there are two bottlenecks about this model: (1) the problem of word clustering, it is hard to find a suitable clustering method that has good performance and has not large amount of computation. (2) class-based method always loses some prediction ability to adapt the text of different domain. In order to solve above problems, a definition of word similarity by utilizing mutual information is presented. Based on word similarity, the definition of word set similarity is given. Experiments show that word clustering algorithm based on similarity is better than conventional greedy clustering method in speed and performance, the perplexity is reduced from 283 to 218.
引用
收藏
页码:21 / 24
页数:4
相关论文
共 50 条
  • [1] A New Word Clustering Algorithm Based on Word Similarity
    YUAN Lichi
    ChineseJournalofElectronics, 2017, 26 (06) : 1221 - 1226
  • [2] A New Word Clustering Algorithm Based on Word Similarity
    Yuan Lichi
    CHINESE JOURNAL OF ELECTRONICS, 2017, 26 (06) : 1221 - 1226
  • [3] Word Clustering based on Word2vec and Semantic Similarity
    Luo Jie
    Wang Qinglin
    Li Yuan
    2014 33RD CHINESE CONTROL CONFERENCE (CCC), 2014, : 517 - 521
  • [4] Algorithms for bigram and trigram word clustering
    Martin, S
    Liermann, J
    Ney, H
    SPEECH COMMUNICATION, 1998, 24 (01) : 19 - 37
  • [5] A CLUSTERING AND WORD SIMILARITY BASED APPROACH FOR IDENTIFYING PRODUCT FEATURE WORDS
    Suryadi, Dedy
    Kim, Harrison
    DS87-6: PROCEEDINGS OF THE 21ST INTERNATIONAL CONFERENCE ON ENGINEERING DESIGN (ICED 17) VOL 6: DESIGN INFORMATION AND KNOWLEDGE, 2017, : 71 - 80
  • [6] Word clustering based on similarity and vari-gram language model
    Yuan, LC
    Zhong, YX
    ICCC2004: Proceedings of the 16th International Conference on Computer Communication Vol 1and 2, 2004, : 1222 - 1226
  • [7] Clustering words for statistical language models based on contextual word similarity
    Farhat, A
    Isabelle, JF
    OShaughnessy, D
    1996 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, CONFERENCE PROCEEDINGS, VOLS 1-6, 1996, : 180 - 183
  • [8] Word sense disambiguation based on word sense clustering
    Anaya-Sanchez, Henry
    Pons-Porrata, Aurora
    Berlanga-Llavori, Rafael
    ADVANCES IN ARTIFICIAL INTELLIGENCE - IBERAMIA-SBIA 2006, PROCEEDINGS, 2006, 4140 : 472 - 481
  • [9] Similarity Word-Sequence Kernels for Sentence Clustering
    Andres-Ferrer, Jesus
    Sanchis-Trilles, German
    Casacuberta, Francisco
    STRUCTURAL, SYNTACTIC, AND STATISTICAL PATTERN RECOGNITION, 2010, 6218 : 610 - 619
  • [10] AUDIO WORD SIMILARITY FOR CLUSTERING WITH ZERO RESOURCES BASED ON ITERATIVE HMM CLASSIFICATION
    Royer, Amelie
    Gravier, Guillaume
    Claveau, Vincent
    2016 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING PROCEEDINGS, 2016, : 5340 - 5344