Performance Analysis of Parallel K-Means with Optimization Algorithms for Clustering on Spark

被引:4
|
作者
Santhi, V. [1 ]
Jose, Rini [1 ]
机构
[1] Anna Univ, PSG Coll Technol, Coimbatore, Tamil Nadu, India
关键词
Clustering; K-Means; Bat algorithm; Firefly algorithm; Big data; Spark;
D O I
10.1007/978-3-319-72344-0_12
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
Clustering divides data into meaningful, useful groups known as clusters without any prior knowledge about the data. One of the drawbacks of K-Means clustering is the estimation of initial centroids which influence the performance of the algorithm. To overcome this issue, optimization algorithms like Bat and Firefly are executed as pre-processing step. These algorithms return optimal centroids which is given as input to the K-Means algorithm. Clustering is carried out on large data sets, therefore Apache Spark, an open source software framework is used. The performance of the optimization algorithms is evaluated and the best algorithm is determined.
引用
收藏
页码:158 / 162
页数:5
相关论文
共 50 条
  • [21] Parallel K-means clustering algorithm on DNA dataset
    Othman, F
    Abdullah, R
    Rashid, NA
    Salam, RA
    PARALLEL AND DISTRIBUTED COMPUTING: APPLICATIONS AND TECHNOLOGIES, PROCEEDINGS, 2004, 3320 : 248 - 251
  • [22] An Improved parallel K-means Clustering Algorithm with MapReduce
    Liao, Qing
    Yang, Fan
    Zhao, Jingming
    2013 15TH IEEE INTERNATIONAL CONFERENCE ON COMMUNICATION TECHNOLOGY (ICCT), 2013, : 764 - 768
  • [23] Stability analysis in K-means clustering
    Steinley, Douglas
    BRITISH JOURNAL OF MATHEMATICAL & STATISTICAL PSYCHOLOGY, 2008, 61 : 255 - 273
  • [24] A practical comparison of two K-Means clustering algorithms
    Gregory A Wilkin
    Xiuzhen Huang
    BMC Bioinformatics, 9
  • [25] Comparison of distributed evolutionary k-means clustering algorithms
    Naldi, M. C.
    Campello, R. J. G. B.
    NEUROCOMPUTING, 2015, 163 : 78 - 93
  • [26] Parallel BVH construction using k-means clustering
    Daniel Meister
    Jiří Bittner
    The Visual Computer, 2016, 32 : 977 - 987
  • [27] Parallel BVH construction using k-means clustering
    Meister, Daniel
    Bittner, Jiri
    VISUAL COMPUTER, 2016, 32 (6-8): : 977 - 987
  • [28] An Enhanced K-Means Genetic Algorithms for Optimal Clustering
    Anusha, M.
    Sathiaseelan, J. G. R.
    2014 IEEE INTERNATIONAL CONFERENCE ON COMPUTATIONAL INTELLIGENCE AND COMPUTING RESEARCH (IEEE ICCIC), 2014, : 580 - 584
  • [29] Algorithms for K-means Clustering Problem with Balancing Constraint
    Wang Shouqiang
    Chi Zengxiao
    Zhan Sheng
    CCDC 2009: 21ST CHINESE CONTROL AND DECISION CONFERENCE, VOLS 1-6, PROCEEDINGS, 2009, : 3967 - 3972
  • [30] Improving K-means clustering with enhanced Firefly Algorithms
    Xie, Hailun
    Zhang, Li
    Lim, Chee Peng
    Yu, Yonghong
    Liu, Chengyu
    Liu, Han
    Walters, Julie
    APPLIED SOFT COMPUTING, 2019, 84