Performance Analysis of Parallel K-Means with Optimization Algorithms for Clustering on Spark

被引:4
|
作者
Santhi, V. [1 ]
Jose, Rini [1 ]
机构
[1] Anna Univ, PSG Coll Technol, Coimbatore, Tamil Nadu, India
关键词
Clustering; K-Means; Bat algorithm; Firefly algorithm; Big data; Spark;
D O I
10.1007/978-3-319-72344-0_12
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
Clustering divides data into meaningful, useful groups known as clusters without any prior knowledge about the data. One of the drawbacks of K-Means clustering is the estimation of initial centroids which influence the performance of the algorithm. To overcome this issue, optimization algorithms like Bat and Firefly are executed as pre-processing step. These algorithms return optimal centroids which is given as input to the K-Means algorithm. Clustering is carried out on large data sets, therefore Apache Spark, an open source software framework is used. The performance of the optimization algorithms is evaluated and the best algorithm is determined.
引用
收藏
页码:158 / 162
页数:5
相关论文
共 50 条
  • [1] A Comparative Performance Analysis of Fast K-Means Clustering Algorithms
    Beecks, Christian
    Berns, Fabian
    Huewel, Jan David
    Linxen, Andrea
    Schlake, Georg Stefan
    Duesterhus, Tim
    [J]. INFORMATION INTEGRATION AND WEB INTELLIGENCE, IIWAS 2022, 2022, 13635 : 119 - 125
  • [2] Performance of Parallel K-Means Algorithms in Java']Java
    Nigro, Libero
    [J]. ALGORITHMS, 2022, 15 (04)
  • [3] Implementation of hadoop optimization K-means parallel clustering algorithm
    Huang, Suyu
    Tan, Lingli
    [J]. BASIC & CLINICAL PHARMACOLOGY & TOXICOLOGY, 2019, 125 : 160 - 160
  • [4] Performance Analysis of K-Means Seeding Algorithms
    Ortiz-Bejar, Jose
    Tellez, Eric S.
    Graff, Mario
    Ortiz-Bejar, Jesus
    Jacobo, Jaime Cerda
    Zamora-Mendez, Alejandro
    [J]. 2019 IEEE INTERNATIONAL AUTUMN MEETING ON POWER, ELECTRONICS AND COMPUTING (ROPEC 2019), 2019,
  • [5] Towards Enhancement of Performance of K-Means Clustering Using Nature-Inspired Optimization Algorithms
    Fong, Simon
    Deb, Suash
    Yang, Xin-She
    Zhuang, Yan
    [J]. SCIENTIFIC WORLD JOURNAL, 2014,
  • [6] Comparative Study of Two Parallel Algorithm K-Means and DBSCAN Clustering on Spark Platform
    Bouhout, Safae
    Oubenaalla, Youness
    Nfaoui, El Habib
    [J]. ADVANCED INTELLIGENT SYSTEMS FOR SUSTAINABLE DEVELOPMENT (AI2SD'2020), VOL 2, 2022, 1418 : 245 - 262
  • [7] Manifold optimization for k-means clustering
    Carson, Timothy
    Mixon, Dustin G.
    Villar, Soledad
    Ward, Rachel
    [J]. 2017 INTERNATIONAL CONFERENCE ON SAMPLING THEORY AND APPLICATIONS (SAMPTA), 2017, : 73 - 77
  • [8] A Survey on Various K-Means algorithms for Clustering
    Singh, Malwinder
    Bansal, Meenakshi
    [J]. INTERNATIONAL JOURNAL OF COMPUTER SCIENCE AND NETWORK SECURITY, 2015, 15 (06): : 60 - 65
  • [9] Clustering performance comparison using K-means and expectation maximization algorithms
    Jung, Yong Gyu
    Kang, Min Soo
    Heo, Jun
    [J]. BIOTECHNOLOGY & BIOTECHNOLOGICAL EQUIPMENT, 2014, 28 : S44 - S48
  • [10] The seeding algorithms for spherical k-means clustering
    Min Li
    Dachuan Xu
    Dongmei Zhang
    Juan Zou
    [J]. Journal of Global Optimization, 2020, 76 : 695 - 708