A Spark-based Artificial Bee Colony Algorithm for Large-scale Data Clustering

被引:4
|
作者
Wang, Yanjie [1 ]
Qian, Quan [1 ,2 ,3 ]
机构
[1] Shanghai Univ, Sch Comp Engn & Sci, Shanghai 200444, Peoples R China
[2] Shanghai Univ, Shanghai Inst Adv Commun & Data Sci, Shanghai 200444, Peoples R China
[3] Shanghai Univ, Mat Genome Inst, Shanghai 200444, Peoples R China
基金
上海市自然科学基金;
关键词
Clustering; Artificial bee colony; Spark; OPTIMIZATION;
D O I
10.1109/HPCC/SmartCity/DSS.2018.00204
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Clustering is one of the most common data analysis methods which aims to partition data into a certain number of clusters, so that the data within the same cluster are similar and dissimilar from data in other clusters. Our research goal is to find more efficient clustering algorithms for large-scale data. Spark is the most popular distributed computing platform which provides a series of high-level API to make high-performance parallel applications. The Spark-based artificial bee algorithm proposed in this paper combines the robust artificial bee colony algorithm with the powerful Spark framework, which is very suitable for clustering large-scale data. To verify the effectiveness of this method, we adopt KDD CUP 99 data, an open competition dataset as the experimental data. The experimental results illustrate that our algorithm can get a good clustering quality and almost ideal speedup compared with the serial algorithms.
引用
收藏
页码:1213 / 1218
页数:6
相关论文
共 50 条
  • [1] A Spark-Based Artificial Bee Colony Algorithm for Unbalanced Large Data Classification
    Al-Sawwa, Jamil
    Almseidin, Mohammad
    [J]. INFORMATION, 2022, 13 (11)
  • [2] A MapReduce-based artificial bee colony for large-scale data clustering
    Banharnsakun, Anan
    [J]. PATTERN RECOGNITION LETTERS, 2017, 93 : 78 - 84
  • [3] Improved Artificial Bee Colony Algorithm for Large-Scale Optimization Problems
    Gocho, Ryuta
    Utani, Akihide
    Yamamoto, Hisao
    [J]. PROCEEDINGS OF THE SIXTEENTH INTERNATIONAL SYMPOSIUM ON ARTIFICIAL LIFE AND ROBOTICS (AROB 16TH '11), 2011, : 605 - 608
  • [4] Spark-based Large-scale Matrix Inversion for Big Data Processing
    Liang, Yang
    Liu, Jun
    Fang, Cheng
    Ansari, Nirwan
    [J]. 2016 IEEE CONFERENCE ON COMPUTER COMMUNICATIONS WORKSHOPS (INFOCOM WKSHPS), 2016,
  • [5] Memetic Artificial Bee Colony Algorithm for Large-Scale Global Optimization
    Fister, Iztok
    Fister, Iztok, Jr.
    Brest, Janez
    Zumer, Viljem
    [J]. 2012 IEEE CONGRESS ON EVOLUTIONARY COMPUTATION (CEC), 2012,
  • [6] Spark-Based Large-Scale Matrix Inversion for Big Data Processing
    Liu, Jun
    Liang, Yang
    Ansari, Nirwan
    [J]. IEEE ACCESS, 2016, 4 : 2166 - 2176
  • [7] Many-objective artificial bee colony algorithm for large-scale software module clustering problem
    Amarjeet
    Chhabra, Jitender Kumar
    [J]. SOFT COMPUTING, 2018, 22 (19) : 6341 - 6361
  • [8] Many-objective artificial bee colony algorithm for large-scale software module clustering problem
    Jitender Kumar Amarjeet
    [J]. Soft Computing, 2018, 22 : 6341 - 6361
  • [9] A ranking paired based artificial bee colony algorithm for data clustering
    Xu, Haiping
    Dong, Zhengshan
    Xu, Meiqin
    Lin, Geng
    [J]. INTERNATIONAL JOURNAL OF COMPUTING SCIENCE AND MATHEMATICS, 2022, 16 (04) : 389 - 398
  • [10] A Novel Artificial Bee Colony Based Clustering Algorithm for Categorical Data
    Ji, Jinchao
    Pang, Wei
    Zheng, Yanlin
    Wang, Zhe
    Ma, Zhiqiang
    [J]. PLOS ONE, 2015, 10 (05):