A Spark-based Artificial Bee Colony Algorithm for Large-scale Data Clustering

被引:4
|
作者
Wang, Yanjie [1 ]
Qian, Quan [1 ,2 ,3 ]
机构
[1] Shanghai Univ, Sch Comp Engn & Sci, Shanghai 200444, Peoples R China
[2] Shanghai Univ, Shanghai Inst Adv Commun & Data Sci, Shanghai 200444, Peoples R China
[3] Shanghai Univ, Mat Genome Inst, Shanghai 200444, Peoples R China
基金
上海市自然科学基金;
关键词
Clustering; Artificial bee colony; Spark; OPTIMIZATION;
D O I
10.1109/HPCC/SmartCity/DSS.2018.00204
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Clustering is one of the most common data analysis methods which aims to partition data into a certain number of clusters, so that the data within the same cluster are similar and dissimilar from data in other clusters. Our research goal is to find more efficient clustering algorithms for large-scale data. Spark is the most popular distributed computing platform which provides a series of high-level API to make high-performance parallel applications. The Spark-based artificial bee algorithm proposed in this paper combines the robust artificial bee colony algorithm with the powerful Spark framework, which is very suitable for clustering large-scale data. To verify the effectiveness of this method, we adopt KDD CUP 99 data, an open competition dataset as the experimental data. The experimental results illustrate that our algorithm can get a good clustering quality and almost ideal speedup compared with the serial algorithms.
引用
收藏
页码:1213 / 1218
页数:6
相关论文
共 50 条
  • [31] A stratified sampling based clustering algorithm for large-scale data
    Zhao, Xingwang
    Liang, Jiye
    Dang, Chuangyin
    [J]. KNOWLEDGE-BASED SYSTEMS, 2019, 163 : 416 - 428
  • [32] An Improved Artificial Bee Colony (ABC) Algorithm for Large Scale Optimization
    Liang, Yu
    Liu, Yu
    Zhang, Liang
    [J]. 2013 2ND INTERNATIONAL SYMPOSIUM ON INSTRUMENTATION AND MEASUREMENT, SENSOR NETWORK AND AUTOMATION (IMSNA), 2013, : 644 - 648
  • [33] An artificial bee colony algorithm for mixture model-based clustering
    Culos, Anthony E.
    Andrews, Jeffrey L.
    Afshari, Hamid
    [J]. COMMUNICATIONS IN STATISTICS-SIMULATION AND COMPUTATION, 2022, 51 (10) : 5658 - 5669
  • [34] Spatial clustering algorithm with obstacles constraints based on artificial bee colony
    Sun, Li-ping
    Luo, Yong-long
    Ding, Xin-tao
    Chen, Fu-long
    [J]. Computer Modelling and New Technologies, 2014, 18 (10): : 324 - 328
  • [35] Hyperspectral Image Clustering Method Based on Artificial Bee Colony Algorithm
    Sun, Xu
    Yang, Lina
    Zhang, Bing
    Gao, Lianru
    Zhang, Liang
    [J]. 2013 SIXTH INTERNATIONAL CONFERENCE ON ADVANCED COMPUTATIONAL INTELLIGENCE (ICACI), 2013, : 106 - 109
  • [36] A Clustering Particle Based Artificial Bee Colony Algorithm for Dynamic Environment
    Biswas, Subhodip
    Bose, Digbalay
    Kundu, Souvik
    [J]. SWARM, EVOLUTIONARY, AND MEMETIC COMPUTING, (SEMCCO 2012), 2012, 7677 : 151 - 159
  • [37] An adaptive spark-based framework for querying large-scale NoSQL and relational databases
    Khashan, Eman
    Eldesouky, Ali
    Elghamrawy, Sally
    [J]. PLOS ONE, 2021, 16 (08):
  • [38] Multi-strategy ensemble artificial bee colony algorithm for large-scale production scheduling problem
    Wang, Hui
    Wang, Wenjun
    Sun, Hui
    [J]. International Journal of Innovative Computing and Applications, 2015, 6 (3-4) : 128 - 136
  • [39] A new approach for data clustering using hybrid artificial bee colony algorithm
    Yan, Xiaohui
    Zhu, Yunlong
    Zou, Wenping
    Wang, Liang
    [J]. NEUROCOMPUTING, 2012, 97 : 241 - 250
  • [40] A Novel History-driven Artificial Bee Colony Algorithm for Data Clustering
    Zabihi, Farzaneh
    Nasiri, Babak
    [J]. APPLIED SOFT COMPUTING, 2018, 71 : 226 - 241