A K-means Text Clustering Algorithm Based on Subject Feature Vector

被引:0
|
作者
Duo, Ji [1 ]
Zhang, Peng [3 ]
Hao, Liu [2 ]
机构
[1] Criminal Invest Police Univ China, Dept Cyber Crime Invest, Shenyang, Peoples R China
[2] Criminal Invest Police Univ China, Shenyang, Peoples R China
[3] Chinese Acad Sci, Inst Informat Engn, Beijing, Peoples R China
来源
JOURNAL OF WEB ENGINEERING | 2021年 / 20卷 / 06期
关键词
k-means; initial points; decision graph; iterative class center; subject feature vector;
D O I
10.13052/jwe1540-9589.20612
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
As one of the most popular clustering algorithms, k-means is easily influenced by initial points and the number of clusters, besides, the iterative class center calculated by the mean of all points in a cluster is one of the reasons influencing clustering performance. Representational initial points are selected in this paper according to the decision graph composed by local density and distance of each point. Then we propose an improved k-means text clustering algorithm, the iterative class center of the improved algorithm is composed by subject feature vector which can avoid the influence caused by noises. Experiments show that the initial points are selected successfully and the clustering results improve 3%, 5%, 2% and 7% respectively than traditional k-means clustering algorithm on four experimental corpuses of Fudan and Sougou.
引用
收藏
页码:1935 / 1946
页数:12
相关论文
共 50 条
  • [1] Chinese text clustering algorithm based k-means
    Yao, Mingyu
    Pi, Dechang
    Cong, Xiangxiang
    [J]. 2012 INTERNATIONAL CONFERENCE ON MEDICAL PHYSICS AND BIOMEDICAL ENGINEERING (ICMPBE2012), 2012, 33 : 301 - 307
  • [2] Chinese Text Clustering Algorithm Based K-Means
    Yao, Mingyu
    Pi, Dechang
    Cong, Xiangxiang
    [J]. 2011 AASRI CONFERENCE ON ARTIFICIAL INTELLIGENCE AND INDUSTRY APPLICATION (AASRI-AIIA 2011), VOL 1, 2011, : 90 - 93
  • [3] Weighted k-Means Algorithm Based Text Clustering
    Chen, Xiuguo
    Yin, Wensheng
    Tu, Pinghui
    Zhang, Hengxi
    [J]. IEEC 2009: FIRST INTERNATIONAL SYMPOSIUM ON INFORMATION ENGINEERING AND ELECTRONIC COMMERCE, PROCEEDINGS, 2009, : 51 - +
  • [4] Feature Selection Algorithm Based on K-means Clustering
    Tang, Xue
    Dong, Min
    Bi, Sheng
    Pei, Maofeng
    Cao, Dan
    Xie, Cheche
    Chi, Sunhuang
    [J]. 2017 IEEE 7TH ANNUAL INTERNATIONAL CONFERENCE ON CYBER TECHNOLOGY IN AUTOMATION, CONTROL, AND INTELLIGENT SYSTEMS (CYBER), 2017, : 1522 - 1527
  • [5] Subspace clustering of text documents with feature weighting K-means algorithm
    Jing, LP
    Ng, MK
    Xu, J
    Huang, JZ
    [J]. ADVANCES IN KNOWLEDGE DISCOVERY AND DATA MINING, PROCEEDINGS, 2005, 3518 : 802 - 812
  • [6] Design and application of a text clustering algorithm based on parallelized k-means clustering
    Wang H.
    Zhou C.
    Li L.
    [J]. Revue d'Intelligence Artificielle, 2019, 33 (06) : 453 - 460
  • [7] Distributed Algorithm for Text Documents Clustering Based on k-Means Approach
    Sarnovsky, Martin
    Carnoka, Noema
    [J]. INFORMATION SYSTEMS ARCHITECTURE AND TECHNOLOGY, ISAT 2015, PT II, 2016, 430 : 165 - 174
  • [8] An improved K-Means text clustering algorithm based on Local Search
    Liu, Xiangwei
    [J]. 2008 4TH INTERNATIONAL CONFERENCE ON WIRELESS COMMUNICATIONS, NETWORKING AND MOBILE COMPUTING, VOLS 1-31, 2008, : 11578 - 11581
  • [9] Similarity matrix-based K-means algorithm for text clustering
    曹奇敏
    郭巧
    吴向华
    [J]. Journal of Beijing Institute of Technology, 2015, 24 (04) : 566 - 572
  • [10] Improved K-Means algorithm in text semantic clustering
    Ma, Junhong
    [J]. Open Cybernetics and Systemics Journal, 2014, 8 : 530 - 534