Improved k-means clustering algorithm and its applications

被引:2
|
作者
Qi, Hui [1 ,2 ]
Li, Jinqing [2 ]
Di, Xiaoqiang [1 ,2 ]
Ren, Weiwu [1 ,2 ]
Zhang, Fengrong [3 ]
机构
[1] National and Local Joint Engineering Research Center of Space and Optoelectronics Technology, Changchun University of Science and Technology, Changchun, China
[2] School of Computer Science and Technology, Changchun University of Science and Technology, Changchun, China
[3] Northeast Normal University, Changchun, China
来源
Recent Patents on Engineering | 2019年 / 13卷 / 04期
基金
中国国家社会科学基金;
关键词
Computer crime - Principal component analysis;
D O I
10.2174/1872212113666181203110611
中图分类号
学科分类号
摘要
Background: K-means algorithm is implemented through two steps: initialization and subsequent iterations. Initialization is to select the initial cluster center, while subsequent iterations are to continuously change the cluster center until it won't change any more or the number of iterations reaches its maximum. K-means algorithm is so sensitive to the cluster center selected during initialization that the selection of a different initial cluster center will influence the algorithm performance. Therefore, improving the initialization process has become an important means of K-means performance improvement. Methods: This paper uses a new strategy to select the initial cluster center. It first calculates the minimum and maximum values of the data in a certain index (For lower-dimensional data, such as two-dimensional data, features with larger variance, or the distance to the origin can be selected; for higher-dimensional data, PCA can be used to select the principal component with the largest vari-ance), and then divides the range into equally-sized sub-ranges. Next adjust the sub-ranges based on the data distribution so that each sub-range contains as much data as possible. Finally, the mean value of the data in each sub-range is calculated and used as the initial clustering center. Results: The theoretical analysis shows that although the time complexity of the initialization process is linear, the algorithm has the characteristics of the superlinear initialization method. This algorithm is applied to two-dimensional GPS data analysis and high-dimensional network attack detection. Experimental results show that this algorithm achieves high clustering performance and clustering speed. Conclusion: This paper reduces the subsequent iterations of K-means algorithm without compromising the clustering performance, which makes it suitable for large-scale data clustering. This algorithm can not only be applied to low-dimensional data clustering, but also suitable for high-dimensional data. © 2019 Bentham Science Publishers.
引用
收藏
页码:403 / 409
相关论文
共 50 条
  • [21] Design and Implementation of an Improved K-Means Clustering Algorithm
    Zhao, Huiling
    [J]. MOBILE INFORMATION SYSTEMS, 2022, 2022
  • [22] A Nonuniform Clustering Routing Algorithm Based on an Improved K-Means Algorithm
    Tang, Xinliang
    Zhang, Man
    Yu, Pingping
    Liu, Wei
    Cao, Ning
    Xu, Yunfeng
    [J]. CMC-COMPUTERS MATERIALS & CONTINUA, 2020, 64 (03): : 1725 - 1739
  • [23] Improved rough K-means clustering algorithm based on firefly algorithm
    Ye, Tingyu
    Ye, Jun
    Wang, Lei
    [J]. INTERNATIONAL JOURNAL OF COMPUTING SCIENCE AND MATHEMATICS, 2023, 17 (01) : 1 - 12
  • [24] K-means clustering algorithm based on improved flower pollination algorithm
    Jiang, Shuhao
    Wang, Mengyuan
    Guo, Jichang
    Wang, Mengqian
    [J]. JOURNAL OF ELECTRONIC IMAGING, 2023, 32 (03)
  • [25] Linear transformations and the k-means clustering algorithm:: Applications to clustering curves
    Tarpey, Thaddeus
    [J]. AMERICAN STATISTICIAN, 2007, 61 (01): : 34 - 40
  • [26] Digital image clustering based on improved k-means algorithm
    Gao Xi
    Hu Zi-mu
    [J]. CHINESE JOURNAL OF LIQUID CRYSTALS AND DISPLAYS, 2020, 35 (02) : 173 - 179
  • [27] An Improved Hierarchical K-Means Algorithm for Web Document Clustering
    Liu, Yongxin
    Liu, Zhijng
    [J]. PROCEEDINGS OF THE INTERNATIONAL CONFERENCE ON COMPUTER SCIENCE AND INFORMATION TECHNOLOGY, 2008, : 606 - 610
  • [28] An Improved Semi-Supervised K-Means Clustering Algorithm
    Ye Hanmin
    Lv Hao
    Sun Qianting
    [J]. 2016 IEEE INFORMATION TECHNOLOGY, NETWORKING, ELECTRONIC AND AUTOMATION CONTROL CONFERENCE (ITNEC), 2016, : 41 - 44
  • [29] An Improved Sampling K-means Clustering Algorithm Based on MapReduce
    Zhang Ya-ling
    Wang Ya-nan
    [J]. 2017 13TH INTERNATIONAL CONFERENCE ON NATURAL COMPUTATION, FUZZY SYSTEMS AND KNOWLEDGE DISCOVERY (ICNC-FSKD), 2017,
  • [30] An Improved K-means Clustering Algorithm for Sleep Stages Classification
    Xiao Shuyuan
    Wang Bei
    Zhang Jian
    Zhang Qunfeng
    Zou Junzhong
    Nakamura, Masatoshi
    [J]. 2015 54TH ANNUAL CONFERENCE OF THE SOCIETY OF INSTRUMENT AND CONTROL ENGINEERS OF JAPAN (SICE), 2015, : 1222 - 1227