Chinese text clustering algorithm based k-means

被引:14
|
作者
Yao, Mingyu [1 ]
Pi, Dechang [1 ]
Cong, Xiangxiang [2 ]
机构
[1] Nanjing Univ Aeronaut & Astronaut, Coll Informat Sci & Technol, Nanjing 210016, Jiangsu, Peoples R China
[2] E China Univ Sci & Technol, Sch Informat Sci & Engn, Shanghai 200237, Peoples R China
关键词
text cluster; k-means; Chinese text;
D O I
10.1016/j.phpro.2012.05.066
中图分类号
Q6 [生物物理学];
学科分类号
071011 ;
摘要
Text clustering is an important means and method in text mining. The process of Chinese text clustering based on k-means was emphasized, we found that new center of a cluster was easily effected by isolated text after some experiments. Average similarity of one cluster was used as a parameter, and multiplied it with a modulus between 0.75 and 1.25 to get the similarity threshold value, the texts whose similarity with original cluster center was greater than or equal to the threshold value ware collected as a candidate collection, then updated the cluster center with center of candidate collection. The experiments show that improved method averagely increased purity and F value about 10 percent over the original method. (C) 2012 Published by Elsevier B.V. Selection and/or peer review under responsibility of ICMPBE International Committee.
引用
收藏
页码:301 / 307
页数:7
相关论文
共 50 条
  • [1] Chinese Text Clustering Algorithm Based K-Means
    Yao, Mingyu
    Pi, Dechang
    Cong, Xiangxiang
    [J]. 2011 AASRI CONFERENCE ON ARTIFICIAL INTELLIGENCE AND INDUSTRY APPLICATION (AASRI-AIIA 2011), VOL 1, 2011, : 90 - 93
  • [2] A new Chinese text clustering algorithm based on WRD and improved K-means
    Cui, Zicai
    Zhong, Bocheng
    Bai, Chen
    [J]. INTELLIGENT DATA ANALYSIS, 2023, 27 (04) : 1205 - 1220
  • [3] Weighted k-Means Algorithm Based Text Clustering
    Chen, Xiuguo
    Yin, Wensheng
    Tu, Pinghui
    Zhang, Hengxi
    [J]. IEEC 2009: FIRST INTERNATIONAL SYMPOSIUM ON INFORMATION ENGINEERING AND ELECTRONIC COMMERCE, PROCEEDINGS, 2009, : 51 - +
  • [4] Design and application of a text clustering algorithm based on parallelized k-means clustering
    Wang H.
    Zhou C.
    Li L.
    [J]. Revue d'Intelligence Artificielle, 2019, 33 (06) : 453 - 460
  • [5] Distributed Algorithm for Text Documents Clustering Based on k-Means Approach
    Sarnovsky, Martin
    Carnoka, Noema
    [J]. INFORMATION SYSTEMS ARCHITECTURE AND TECHNOLOGY, ISAT 2015, PT II, 2016, 430 : 165 - 174
  • [6] An improved K-Means text clustering algorithm based on Local Search
    Liu, Xiangwei
    [J]. 2008 4TH INTERNATIONAL CONFERENCE ON WIRELESS COMMUNICATIONS, NETWORKING AND MOBILE COMPUTING, VOLS 1-31, 2008, : 11578 - 11581
  • [7] A K-means Text Clustering Algorithm Based on Subject Feature Vector
    Duo, Ji
    Zhang, Peng
    Hao, Liu
    [J]. JOURNAL OF WEB ENGINEERING, 2021, 20 (06): : 1935 - 1946
  • [8] Similarity matrix-based K-means algorithm for text clustering
    曹奇敏
    郭巧
    吴向华
    [J]. Journal of Beijing Institute of Technology, 2015, 24 (04) : 566 - 572
  • [9] Improved K-Means algorithm in text semantic clustering
    Ma, Junhong
    [J]. Open Cybernetics and Systemics Journal, 2014, 8 : 530 - 534
  • [10] A k-means based clustering algorithm
    Bloisi, Domenico Daniele
    Locchi, Luca
    [J]. COMPUTER VISION SYSTEMS, PROCEEDINGS, 2008, 5008 : 109 - 118