Performance Analysis of Parallel K-Means with Optimization Algorithms for Clustering on Spark

被引:4
|
作者
Santhi, V. [1 ]
Jose, Rini [1 ]
机构
[1] Anna Univ, PSG Coll Technol, Coimbatore, Tamil Nadu, India
关键词
Clustering; K-Means; Bat algorithm; Firefly algorithm; Big data; Spark;
D O I
10.1007/978-3-319-72344-0_12
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
Clustering divides data into meaningful, useful groups known as clusters without any prior knowledge about the data. One of the drawbacks of K-Means clustering is the estimation of initial centroids which influence the performance of the algorithm. To overcome this issue, optimization algorithms like Bat and Firefly are executed as pre-processing step. These algorithms return optimal centroids which is given as input to the K-Means algorithm. Clustering is carried out on large data sets, therefore Apache Spark, an open source software framework is used. The performance of the optimization algorithms is evaluated and the best algorithm is determined.
引用
收藏
页码:158 / 162
页数:5
相关论文
共 50 条
  • [31] Parallel bisecting k-means with prediction clustering algorithm
    Li, Yanjun
    Chung, Soon M.
    JOURNAL OF SUPERCOMPUTING, 2007, 39 (01): : 19 - 37
  • [32] Enhanced Parallel Implementation of the K-Means Clustering Algorithm
    Baydoun, Mohammed
    Dawi, Mohammad
    Ghaziri, Hassan
    2016 3RD INTERNATIONAL CONFERENCE ON ADVANCES IN COMPUTATIONAL TOOLS FOR ENGINEERING APPLICATIONS (ACTEA), 2016, : 7 - 11
  • [33] Parallel batch k-means for Big data clustering
    Alguliyev, Rasim M.
    Aliguliyev, Ramiz M.
    Sukhostat, Lyudmila, V
    COMPUTERS & INDUSTRIAL ENGINEERING, 2021, 152
  • [34] Parallel bisecting k-means with prediction clustering algorithm
    Yanjun Li
    Soon M. Chung
    The Journal of Supercomputing, 2007, 39 : 19 - 37
  • [35] A Comparative Study of K-Means, K-Means plus plus and Fuzzy C-Means Clustering Algorithms
    Kapoor, Akanksha
    Singhal, Abhishek
    2017 3RD IEEE INTERNATIONAL CONFERENCE ON COMPUTATIONAL INTELLIGENCE & COMMUNICATION TECHNOLOGY (CICT), 2017,
  • [36] New initialization approaches for the k-means and particle swarm optimization based clustering algorithms
    Cinaroglu, Sinem
    Bulut, Hasan
    JOURNAL OF THE FACULTY OF ENGINEERING AND ARCHITECTURE OF GAZI UNIVERSITY, 2018, 33 (02): : 413 - 422
  • [37] New initialization approaches for the k-means and particle swarm optimization based clustering algorithms
    K-ortalamalar ve parçacık sürü optimizasyonu tabanlı kümeleme algoritmaları için yeni ilklendirme yaklaşımları
    Bulut, Hasan (hasan.bulut@ege.edu.tr), 2018, Gazi Universitesi (33):
  • [38] Optimization of constitutive parameters of foundation soils k-means clustering analysis
    Muge Elif Orakoglu
    Cevdet Emin Ekinci
    Sciences in Cold and Arid Regions, 2013, 5 (05) : 626 - 636
  • [39] Clustering Performance of an Evolutionary K-Means Algorithm
    Nigro, Libero
    Cicirelli, Franco
    Pupo, Francesco
    PROCEEDINGS OF NINTH INTERNATIONAL CONGRESS ON INFORMATION AND COMMUNICATION TECHNOLOGY, VOL 9, ICICT 2024, 2025, 1054 : 359 - 369
  • [40] Statistically Improving K-means Clustering Performance
    Ihsanoglu, Abdullah
    Zaval, Mounes
    32ND IEEE SIGNAL PROCESSING AND COMMUNICATIONS APPLICATIONS CONFERENCE, SIU 2024, 2024,