An improved K-means algorithm for big data

被引：12

作者：

Moodi, Fatemeh ^{[1
]}

Saadatfar, Hamid ^{[2
]}

机构：

[1] Hormozan Higher Educ Inst, Comp Engn Dept, Birjand, Iran

[2] Univ Birjand, Comp Engn Dept, Univ Blvd, Birjand, Southern Khoras, Iran

来源：

IET SOFTWARE | 2022年 / 16卷 / 01期

关键词：

Iterative methods - K-means clustering;

D O I：

10.1049/sfw2.12032

中图分类号：

TP31 [计算机软件];

学科分类号：

081202 ; 0835 ;

摘要：

An improved version of K-means clustering algorithm that can be applied to big data through lower processing loads with acceptable precision rates is presented here. In this method, the distances from one point to its two nearest centroids were used along with their variations in the last two iterations. Points with an equidistance threshold greater than the equidistance index were eliminated from the distance calculations and were stabilised in the cluster. Although these points are compared with the research index -cluster radius-again in the algorithm iteration, the excluded points are again included in the calculations if their distances from the stabilised cluster centroid are longer than the cluster radius. This can improve the clustering quality. Computerised tests as well as synthetic and real samples show that this method is able to improve the clustering quality by up to 41.85% in the best-case scenario. According to the findings, the proposed method is very beneficial to big data.

引用

页码：48 / 59

页数：12

共 50 条

[41] An Improved K-means Algorithm for Document Clustering
Wu, Guohua
Lin, Hairong
Fu, Ershuai
Wang, Liuyang
2015 INTERNATIONAL CONFERENCE ON COMPUTER SCIENCE AND MECHANICAL AUTOMATION (CSMA), 2015, : 65 - 69
[42] On K-means Data Clustering Algorithm with Genetic Algorithm
Kapil, Shruti
Chawla, Meenu
Ansari, Mohd Dilshad
2016 FOURTH INTERNATIONAL CONFERENCE ON PARALLEL, DISTRIBUTED AND GRID COMPUTING (PDGC), 2016, : 202 - 206
[43] Application of Improved K-means Algorithm in E-commerce Data Processing
Chen, Wenwei
Wang, Qindi
Informatica (Slovenia), 2024, 48 (11): : 147 - 166
[44] Deterministic Coresets for k-Means of Big Sparse Data
Barger, Artem
Feldman, Dan
ALGORITHMS, 2020, 13 (04)
[45] A Clustering K-means Algorithm Based on Improved PSO Algorithm
Tan, Long
2015 FIFTH INTERNATIONAL CONFERENCE ON COMMUNICATION SYSTEMS AND NETWORK TECHNOLOGIES (CSNT2015), 2015, : 940 - 944
[46] How to Use K-means for Big Data Clustering?
Mussabayev, Rustam
Mladenovic, Nenad
Jarboui, Bassem
Mussabayev, Ravil
PATTERN RECOGNITION, 2023, 137
[47] Data design and analysis based on cloud computing and improved K-Means algorithm
Wu, Chunqiong
Yu, Rongrui
Yan, Bingwen
Huang, Zhangshu
Yu, Baoqin
Yu, Yanliang
Chen, Na
Zhou, Xiukao
JOURNAL OF INTELLIGENT & FUZZY SYSTEMS, 2020, 39 (04) : 5067 - 5074
[48] Parallel batch k-means for Big data clustering
Alguliyev, Rasim M.
Aliguliyev, Ramiz M.
Sukhostat, Lyudmila, V
COMPUTERS & INDUSTRIAL ENGINEERING, 2021, 152
[49] K-Means Parallel Algorithm of Big Data Clustering Based on Mapreduce PCAM Method
Li, Yongyi
Yang, Zhongqiang
Han, Kaixu
Engineering Intelligent Systems, 2021, 29 (06): : 411 - 418
[50] An Improved Algorithm of K-means Based on Evolutionary Computation
Wang, Yunlong
Luo, Xiong
Zhang, Jing
Zhao, Zhigang
Zhang, Jun
INTELLIGENT AUTOMATION AND SOFT COMPUTING, 2020, 26 (05): : 961 - 971

← 1 2 3 4 5 →