An improved K-means algorithm for big data

被引:12
|
作者
Moodi, Fatemeh [1 ]
Saadatfar, Hamid [2 ]
机构
[1] Hormozan Higher Educ Inst, Comp Engn Dept, Birjand, Iran
[2] Univ Birjand, Comp Engn Dept, Univ Blvd, Birjand, Southern Khoras, Iran
关键词
Iterative methods - K-means clustering;
D O I
10.1049/sfw2.12032
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
An improved version of K-means clustering algorithm that can be applied to big data through lower processing loads with acceptable precision rates is presented here. In this method, the distances from one point to its two nearest centroids were used along with their variations in the last two iterations. Points with an equidistance threshold greater than the equidistance index were eliminated from the distance calculations and were stabilised in the cluster. Although these points are compared with the research index -cluster radius-again in the algorithm iteration, the excluded points are again included in the calculations if their distances from the stabilised cluster centroid are longer than the cluster radius. This can improve the clustering quality. Computerised tests as well as synthetic and real samples show that this method is able to improve the clustering quality by up to 41.85% in the best-case scenario. According to the findings, the proposed method is very beneficial to big data.
引用
收藏
页码:48 / 59
页数:12
相关论文
共 50 条
  • [41] An Improved K-means Algorithm for Document Clustering
    Wu, Guohua
    Lin, Hairong
    Fu, Ershuai
    Wang, Liuyang
    2015 INTERNATIONAL CONFERENCE ON COMPUTER SCIENCE AND MECHANICAL AUTOMATION (CSMA), 2015, : 65 - 69
  • [42] On K-means Data Clustering Algorithm with Genetic Algorithm
    Kapil, Shruti
    Chawla, Meenu
    Ansari, Mohd Dilshad
    2016 FOURTH INTERNATIONAL CONFERENCE ON PARALLEL, DISTRIBUTED AND GRID COMPUTING (PDGC), 2016, : 202 - 206
  • [43] Application of Improved K-means Algorithm in E-commerce Data Processing
    Chen, Wenwei
    Wang, Qindi
    Informatica (Slovenia), 2024, 48 (11): : 147 - 166
  • [44] Deterministic Coresets for k-Means of Big Sparse Data
    Barger, Artem
    Feldman, Dan
    ALGORITHMS, 2020, 13 (04)
  • [45] A Clustering K-means Algorithm Based on Improved PSO Algorithm
    Tan, Long
    2015 FIFTH INTERNATIONAL CONFERENCE ON COMMUNICATION SYSTEMS AND NETWORK TECHNOLOGIES (CSNT2015), 2015, : 940 - 944
  • [46] How to Use K-means for Big Data Clustering?
    Mussabayev, Rustam
    Mladenovic, Nenad
    Jarboui, Bassem
    Mussabayev, Ravil
    PATTERN RECOGNITION, 2023, 137
  • [47] Data design and analysis based on cloud computing and improved K-Means algorithm
    Wu, Chunqiong
    Yu, Rongrui
    Yan, Bingwen
    Huang, Zhangshu
    Yu, Baoqin
    Yu, Yanliang
    Chen, Na
    Zhou, Xiukao
    JOURNAL OF INTELLIGENT & FUZZY SYSTEMS, 2020, 39 (04) : 5067 - 5074
  • [48] Parallel batch k-means for Big data clustering
    Alguliyev, Rasim M.
    Aliguliyev, Ramiz M.
    Sukhostat, Lyudmila, V
    COMPUTERS & INDUSTRIAL ENGINEERING, 2021, 152
  • [49] K-Means Parallel Algorithm of Big Data Clustering Based on Mapreduce PCAM Method
    Li, Yongyi
    Yang, Zhongqiang
    Han, Kaixu
    Engineering Intelligent Systems, 2021, 29 (06): : 411 - 418
  • [50] An Improved Algorithm of K-means Based on Evolutionary Computation
    Wang, Yunlong
    Luo, Xiong
    Zhang, Jing
    Zhao, Zhigang
    Zhang, Jun
    INTELLIGENT AUTOMATION AND SOFT COMPUTING, 2020, 26 (05): : 961 - 971