Performance Analysis of Parallel K-Means with Optimization Algorithms for Clustering on Spark

被引:4
|
作者
Santhi, V. [1 ]
Jose, Rini [1 ]
机构
[1] Anna Univ, PSG Coll Technol, Coimbatore, Tamil Nadu, India
关键词
Clustering; K-Means; Bat algorithm; Firefly algorithm; Big data; Spark;
D O I
10.1007/978-3-319-72344-0_12
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
Clustering divides data into meaningful, useful groups known as clusters without any prior knowledge about the data. One of the drawbacks of K-Means clustering is the estimation of initial centroids which influence the performance of the algorithm. To overcome this issue, optimization algorithms like Bat and Firefly are executed as pre-processing step. These algorithms return optimal centroids which is given as input to the K-Means algorithm. Clustering is carried out on large data sets, therefore Apache Spark, an open source software framework is used. The performance of the optimization algorithms is evaluated and the best algorithm is determined.
引用
收藏
页码:158 / 162
页数:5
相关论文
共 50 条
  • [41] CPI-model-based analysis of sparse k-means clustering algorithms
    Kazuo Aoyama
    Kazumi Saito
    Tetsuo Ikeda
    International Journal of Data Science and Analytics, 2021, 12 : 229 - 248
  • [42] CPI-model-based analysis of sparse k-means clustering algorithms
    Aoyama, Kazuo
    Saito, Kazumi
    Ikeda, Tetsuo
    INTERNATIONAL JOURNAL OF DATA SCIENCE AND ANALYTICS, 2021, 12 (03) : 229 - 248
  • [43] Comparative Analysis of K-Means with other Clustering Algorithms to Improve Search Result
    Mehrotra, Shashi
    Kohli, Shruti
    2015 INTERNATIONAL CONFERENCE ON GREEN COMPUTING AND INTERNET OF THINGS (ICGCIOT), 2015, : 309 - 313
  • [44] Workplace Accidents Analysis with a Coupled Clustering Methods: SOM and K-means Algorithms
    Comberti, Lorenzo
    Baldissone, Gabriele
    Demichela, Micaela
    ICHEAP12: 12TH INTERNATIONAL CONFERENCE ON CHEMICAL & PROCESS ENGINEERING, 2015, 43 : 1261 - 1266
  • [45] Comparison between K-Means and K-Medoids Clustering Algorithms
    Madhulatha, Tagaram Soni
    ADVANCES IN COMPUTING AND INFORMATION TECHNOLOGY, 2011, 198 : 472 - 481
  • [46] Performance of Parallel K-Means Based on Theatre
    Cicirelli, Franco
    Nigro, Libero
    Pupo, Francesco
    PROCEEDINGS OF SEVENTH INTERNATIONAL CONGRESS ON INFORMATION AND COMMUNICATION TECHNOLOGY, VOL 4, 2023, 465 : 241 - 249
  • [47] Performance Analysis of K Means Clustering Algorithms for mMTC Systems
    Kim, Haesik
    11TH INTERNATIONAL CONFERENCE ON ICT CONVERGENCE: DATA, NETWORK, AND AI IN THE AGE OF UNTACT (ICTC 2020), 2020, : 30 - 35
  • [48] K-means - a fast and efficient K-means algorithms
    Nguyen C.D.
    Duong T.H.
    Nguyen, Cuong Duc (nguyenduccuong@tdt.edu.vn), 2018, Inderscience Publishers, 29, route de Pre-Bois, Case Postale 856, CH-1215 Geneva 15, CH-1215, Switzerland (11) : 27 - 45
  • [49] A Novel Hybridization of Expectation-Maximization and K-Means Algorithms for Better Clustering Performance
    Kishor, Duggirala Raja
    Venkateswarlu, N. B.
    INTERNATIONAL JOURNAL OF AMBIENT COMPUTING AND INTELLIGENCE, 2016, 7 (02) : 47 - 74
  • [50] A Binary Optimization Approach for Constrained K-Means Clustering
    Le, Huu M.
    Eriksson, Anders
    Thanh-Toan Do
    Milford, Michael
    COMPUTER VISION - ACCV 2018, PT IV, 2019, 11364 : 383 - 398