Improved k-means clustering algorithm and its applications

被引:2
|
作者
Qi, Hui [1 ,2 ]
Li, Jinqing [2 ]
Di, Xiaoqiang [1 ,2 ]
Ren, Weiwu [1 ,2 ]
Zhang, Fengrong [3 ]
机构
[1] National and Local Joint Engineering Research Center of Space and Optoelectronics Technology, Changchun University of Science and Technology, Changchun, China
[2] School of Computer Science and Technology, Changchun University of Science and Technology, Changchun, China
[3] Northeast Normal University, Changchun, China
来源
Recent Patents on Engineering | 2019年 / 13卷 / 04期
基金
中国国家社会科学基金;
关键词
Computer crime - Principal component analysis;
D O I
10.2174/1872212113666181203110611
中图分类号
学科分类号
摘要
Background: K-means algorithm is implemented through two steps: initialization and subsequent iterations. Initialization is to select the initial cluster center, while subsequent iterations are to continuously change the cluster center until it won't change any more or the number of iterations reaches its maximum. K-means algorithm is so sensitive to the cluster center selected during initialization that the selection of a different initial cluster center will influence the algorithm performance. Therefore, improving the initialization process has become an important means of K-means performance improvement. Methods: This paper uses a new strategy to select the initial cluster center. It first calculates the minimum and maximum values of the data in a certain index (For lower-dimensional data, such as two-dimensional data, features with larger variance, or the distance to the origin can be selected; for higher-dimensional data, PCA can be used to select the principal component with the largest vari-ance), and then divides the range into equally-sized sub-ranges. Next adjust the sub-ranges based on the data distribution so that each sub-range contains as much data as possible. Finally, the mean value of the data in each sub-range is calculated and used as the initial clustering center. Results: The theoretical analysis shows that although the time complexity of the initialization process is linear, the algorithm has the characteristics of the superlinear initialization method. This algorithm is applied to two-dimensional GPS data analysis and high-dimensional network attack detection. Experimental results show that this algorithm achieves high clustering performance and clustering speed. Conclusion: This paper reduces the subsequent iterations of K-means algorithm without compromising the clustering performance, which makes it suitable for large-scale data clustering. This algorithm can not only be applied to low-dimensional data clustering, but also suitable for high-dimensional data. © 2019 Bentham Science Publishers.
引用
收藏
页码:403 / 409
相关论文
共 50 条
  • [41] An Improved K-Means Clustering Algorithm Based on Semantic Model
    Liu, Zhe
    Bao, Jianmin
    Ding, Fei
    [J]. PROCEEDINGS OF THE INTERNATIONAL CONFERENCE ON INFORMATION TECHNOLOGY AND ELECTRICAL ENGINEERING 2018 (ICITEE '18), 2018,
  • [42] Application of An Improved K-means Clustering Algorithm in Intrusion Detection
    Yu, Dongmei
    Zhang, Guoli
    Chen, Hui
    [J]. PROCEEDINGS OF THE 2016 INTERNATIONAL CONFERENCE ON COMPUTER ENGINEERING, INFORMATION SCIENCE & APPLICATION TECHNOLOGY (ICCIA 2016), 2016, 56 : 277 - 283
  • [43] An improved K-means clustering algorithm for fish image segmentation
    Yao, Hong
    Duan, Qingling
    Li, Daoliang
    Wang, Jianping
    [J]. MATHEMATICAL AND COMPUTER MODELLING, 2013, 58 (3-4) : 784 - 792
  • [44] Improved K-means Algorithm Based on the Clustering Reliability Analysis
    Zhang, Hong
    Yu, Hong
    Li, Ying
    Hu, Baofang
    [J]. PROCEEDINGS OF THE 2015 INTERNATIONAL SYMPOSIUM ON COMPUTERS & INFORMATICS, 2015, 13 : 2516 - 2523
  • [45] The Improved Research on K-Means Clustering Algorithm in Initial Values
    Liu Guoli
    Li Yanping
    Wang Tingting
    Gao Jinqiao
    Yu Limei
    [J]. PROCEEDINGS 2013 INTERNATIONAL CONFERENCE ON MECHATRONIC SCIENCES, ELECTRIC ENGINEERING AND COMPUTER (MEC), 2013, : 2124 - 2127
  • [46] Clustering of College Students Based on Improved K-means Algorithm
    Fan, Zhongxiang
    Yan, Sun
    [J]. 2016 INTERNATIONAL COMPUTER SYMPOSIUM (ICS), 2016, : 676 - 679
  • [47] Clustering analysis based on improved k-means algorithm and its application in HRM system
    Liu, Yanli
    Liu, Xiyu
    Meng, Yan
    [J]. PROCEEDINGS OF THE 2007 1ST INTERNATIONAL SYMPOSIUM ON INFORMATION TECHNOLOGIES AND APPLICATIONS IN EDUCATION (ISITAE 2007), 2007, : 473 - 477
  • [48] An improved overlapping k-means clustering method for medical applications
    Khanmohammadi, Sina
    Adibeig, Naiier
    Shanehbandy, Samaneh
    [J]. EXPERT SYSTEMS WITH APPLICATIONS, 2017, 67 : 12 - 18
  • [49] An Improved Method for K-Means Clustering
    Cui, Xiaowei
    Wang, Fuxiang
    [J]. 2015 INTERNATIONAL CONFERENCE ON COMPUTATIONAL INTELLIGENCE AND COMMUNICATION NETWORKS (CICN), 2015, : 756 - 759
  • [50] The Clustering Algorithm Based on Improved Antlion Optimization Algorithm with K-Means Concepts
    Feng, Qing
    Pan, Jeng-Shyang
    Huang, Kuan-Chun
    Chu, Shu-Chuan
    [J]. ADVANCES IN INTELLIGENT INFORMATION HIDING AND MULTIMEDIA SIGNAL PROCESSING (IIH-MSP 2021 & FITAT 2021), VOL 2, 2022, 278 : 125 - 135