Particle Swarm Optimization for Large-Scale Clustering on Apache Spark

被引:0
|
作者
Sherar, Matthew [1 ]
Zulkernine, Farhana [1 ]
机构
[1] Queens Univ, Sch Comp, Kingston, ON, Canada
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
We present a particle swarm optimization (PSO) clustering algorithm implemented in Apache Spark to achieve parallel big data clustering. Apache Spark is an in-memory big data analytics framework which uses parallel distributed processing to analyze large amount of data faster than most other existing data analytic tools. Spark's library of data analytic functions does not include the PSO algorithm. PSO is an evolutionary computing technique that has shown to produce more compact clusters than other partitional clustering techniques for a wide range of data. In addition PSO is a paralellizable and customizable algorithm well suited for multi-objective clustering problems. In this paper we present our implementation of a hybrid K-Means PSO (KMPSO) clustering algorithm in Apache Spark and demonstrate the performance gained in Spark by comparing our implementation with an implementation of KMPSO in MATLAB. We demonstrate that KMPSO can produce better clustering results than Spark's built-in clustering algorithms, and that Apache Spark enables efficient scaling of resources to handle large and complex workloads.
引用
收藏
页码:801 / 808
页数:8
相关论文
共 50 条
  • [1] Greedy discrete particle swarm optimization for large-scale social network clustering
    Cai, Qing
    Gong, Maoguo
    Ma, Lijia
    Ruan, Shasha
    Yuan, Fuyan
    Jiao, Licheng
    [J]. INFORMATION SCIENCES, 2015, 316 : 503 - 516
  • [2] Large-Scale Data Pollution with Apache Spark
    Hildebrandt, Kai
    Panse, Fabian
    Wilcke, Niklas
    Ritter, Norbert
    [J]. IEEE TRANSACTIONS ON BIG DATA, 2020, 6 (02) : 396 - 411
  • [3] Processing large-scale data with Apache Spark
    Ko, Seyoon
    Won, Joong-Ho
    [J]. KOREAN JOURNAL OF APPLIED STATISTICS, 2016, 29 (06) : 1077 - 1094
  • [4] Large-Scale Network Embedding in Apache Spark
    Lin, Wenqing
    [J]. KDD '21: PROCEEDINGS OF THE 27TH ACM SIGKDD CONFERENCE ON KNOWLEDGE DISCOVERY & DATA MINING, 2021, : 3271 - 3279
  • [5] Cooperative Particle Swarm Optimization Decomposition Methods for Large-scale Optimization
    Clark, Mitchell
    Ombuki-Berman, Beatrice
    Aksamit, Nicholas
    Engelbrecht, Andries
    [J]. 2022 IEEE SYMPOSIUM SERIES ON COMPUTATIONAL INTELLIGENCE (SSCI), 2022, : 1582 - 1591
  • [6] Large-scale text processing pipeline with Apache Spark
    Svyatkovskiy, A.
    Imai, K.
    Kroeger, M.
    Shiraito, Y.
    [J]. 2016 IEEE INTERNATIONAL CONFERENCE ON BIG DATA (BIG DATA), 2016, : 3928 - 3935
  • [7] Adaptive Granularity Learning Distributed Particle Swarm Optimization for Large-Scale Optimization
    Wang, Zi-Jia
    Zhan, Zhi-Hui
    Kwong, Sam
    Jin, Hu
    Zhang, Jun
    [J]. IEEE TRANSACTIONS ON CYBERNETICS, 2021, 51 (03) : 1175 - 1188
  • [8] Heterogeneous cognitive learning particle swarm optimization for large-scale optimization problems
    Zhang, En
    Nie, Zihao
    Yang, Qiang
    Wang, Yiqiao
    Liu, Dong
    Jeon, Sang-Woon
    Zhang, Jun
    [J]. INFORMATION SCIENCES, 2023, 633 : 321 - 342
  • [9] Superiority combination learning distributed particle swarm optimization for large-scale optimization
    Wang, Zi-Jia
    Yang, Qiang
    Zhang, Yu -Hui
    Chen, Shu-Hong
    Wang, Yuan -Gen
    [J]. APPLIED SOFT COMPUTING, 2023, 136
  • [10] Particle swarm optimization with convergence speed controller for large-scale numerical optimization
    Han Huang
    Liang Lv
    Shujin Ye
    Zhifeng Hao
    [J]. Soft Computing, 2019, 23 : 4421 - 4437