Improved k-means clustering algorithm and its applications

被引：2

作者：

Qi, Hui ^{[1
,2
]}

Li, Jinqing ^{[2
]}

Di, Xiaoqiang ^{[1
,2
]}

Ren, Weiwu ^{[1
,2
]}

Zhang, Fengrong ^{[3
]}

机构：

[1] National and Local Joint Engineering Research Center of Space and Optoelectronics Technology, Changchun University of Science and Technology, Changchun, China

[2] School of Computer Science and Technology, Changchun University of Science and Technology, Changchun, China

[3] Northeast Normal University, Changchun, China

来源：

Recent Patents on Engineering | 2019年 / 13卷 / 04期

基金：

中国国家社会科学基金;

关键词：

Computer crime - Principal component analysis;

D O I：

10.2174/1872212113666181203110611

中图分类号：

学科分类号：

摘要：

Background: K-means algorithm is implemented through two steps: initialization and subsequent iterations. Initialization is to select the initial cluster center, while subsequent iterations are to continuously change the cluster center until it won't change any more or the number of iterations reaches its maximum. K-means algorithm is so sensitive to the cluster center selected during initialization that the selection of a different initial cluster center will influence the algorithm performance. Therefore, improving the initialization process has become an important means of K-means performance improvement. Methods: This paper uses a new strategy to select the initial cluster center. It first calculates the minimum and maximum values of the data in a certain index (For lower-dimensional data, such as two-dimensional data, features with larger variance, or the distance to the origin can be selected; for higher-dimensional data, PCA can be used to select the principal component with the largest vari-ance), and then divides the range into equally-sized sub-ranges. Next adjust the sub-ranges based on the data distribution so that each sub-range contains as much data as possible. Finally, the mean value of the data in each sub-range is calculated and used as the initial clustering center. Results: The theoretical analysis shows that although the time complexity of the initialization process is linear, the algorithm has the characteristics of the superlinear initialization method. This algorithm is applied to two-dimensional GPS data analysis and high-dimensional network attack detection. Experimental results show that this algorithm achieves high clustering performance and clustering speed. Conclusion: This paper reduces the subsequent iterations of K-means algorithm without compromising the clustering performance, which makes it suitable for large-scale data clustering. This algorithm can not only be applied to low-dimensional data clustering, but also suitable for high-dimensional data. © 2019 Bentham Science Publishers.

引用

页码：403 / 409

共 50 条

[41] An Improved K-Means Clustering Algorithm Based on Semantic Model
Liu, Zhe
Bao, Jianmin
Ding, Fei
[J]. PROCEEDINGS OF THE INTERNATIONAL CONFERENCE ON INFORMATION TECHNOLOGY AND ELECTRICAL ENGINEERING 2018 (ICITEE '18), 2018,
[42] Application of An Improved K-means Clustering Algorithm in Intrusion Detection
Yu, Dongmei
Zhang, Guoli
Chen, Hui
[J]. PROCEEDINGS OF THE 2016 INTERNATIONAL CONFERENCE ON COMPUTER ENGINEERING, INFORMATION SCIENCE & APPLICATION TECHNOLOGY (ICCIA 2016), 2016, 56 : 277 - 283
[43] An improved K-means clustering algorithm for fish image segmentation
Yao, Hong
Duan, Qingling
Li, Daoliang
Wang, Jianping
[J]. MATHEMATICAL AND COMPUTER MODELLING, 2013, 58 (3-4) : 784 - 792
[44] Improved K-means Algorithm Based on the Clustering Reliability Analysis
Zhang, Hong
Yu, Hong
Li, Ying
Hu, Baofang
[J]. PROCEEDINGS OF THE 2015 INTERNATIONAL SYMPOSIUM ON COMPUTERS & INFORMATICS, 2015, 13 : 2516 - 2523
[45] The Improved Research on K-Means Clustering Algorithm in Initial Values
Liu Guoli
Li Yanping
Wang Tingting
Gao Jinqiao
Yu Limei
[J]. PROCEEDINGS 2013 INTERNATIONAL CONFERENCE ON MECHATRONIC SCIENCES, ELECTRIC ENGINEERING AND COMPUTER (MEC), 2013, : 2124 - 2127
[46] Clustering of College Students Based on Improved K-means Algorithm
Fan, Zhongxiang
Yan, Sun
[J]. 2016 INTERNATIONAL COMPUTER SYMPOSIUM (ICS), 2016, : 676 - 679
[47] Clustering analysis based on improved k-means algorithm and its application in HRM system
Liu, Yanli
Liu, Xiyu
Meng, Yan
[J]. PROCEEDINGS OF THE 2007 1ST INTERNATIONAL SYMPOSIUM ON INFORMATION TECHNOLOGIES AND APPLICATIONS IN EDUCATION (ISITAE 2007), 2007, : 473 - 477
[48] An improved overlapping k-means clustering method for medical applications
Khanmohammadi, Sina
Adibeig, Naiier
Shanehbandy, Samaneh
[J]. EXPERT SYSTEMS WITH APPLICATIONS, 2017, 67 : 12 - 18
[49] An Improved Method for K-Means Clustering
Cui, Xiaowei
Wang, Fuxiang
[J]. 2015 INTERNATIONAL CONFERENCE ON COMPUTATIONAL INTELLIGENCE AND COMMUNICATION NETWORKS (CICN), 2015, : 756 - 759
[50] The Clustering Algorithm Based on Improved Antlion Optimization Algorithm with K-Means Concepts
Feng, Qing
Pan, Jeng-Shyang
Huang, Kuan-Chun
Chu, Shu-Chuan
[J]. ADVANCES IN INTELLIGENT INFORMATION HIDING AND MULTIMEDIA SIGNAL PROCESSING (IIH-MSP 2021 & FITAT 2021), VOL 2, 2022, 278 : 125 - 135

← 1 2 3 4 5 →