Superior Parallel Big Data Clustering Through Competitive Stochastic Sample Size Optimization in Big-Means

被引:2
|
作者
Mussabayev, Rustam [1 ,2 ]
Mussabayev, Ravil [1 ,3 ]
机构
[1] Satbayev Univ, Satbayev St 22, Alma Ata 050013, Kazakhstan
[2] Inst Informat & Computat Technol, Lab Anal & Modeling Informat Proc, Pushkin St 125, Alma Ata 050010, Kazakhstan
[3] Univ Washington, Dept Math, Padelford Hall C-138, Seattle, WA 98195 USA
关键词
Big-means Clustering; Parallel Computing; Data Mining; Stochastic Variation; Sample Size; Competitive Environment; Parallelization Strategy; Machine Learning; Big Data Analysis; Optimization; Cluster Analysis; K-means; K-means plus; Unsupervised Learning;
D O I
10.1007/978-981-97-4985-0_18
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
This paper introduces a novel K-means clustering algorithm, an advancement on the conventional Big-means methodology. The proposed method efficiently integrates parallel processing, stochastic sampling, and competitive optimization to create a scalable variant designed for big data applications. It addresses scalability and computation time challenges typically faced with traditional techniques. The algorithm adjusts sample sizes dynamically for each worker during execution, optimizing performance. Data from these sample sizes are continually analyzed, facilitating the identification of the most efficient configuration. By incorporating a competitive element among workers using different sample sizes, efficiency within the Big-means algorithm is further stimulated. In essence, the algorithm balances computational time and clustering quality by employing a stochastic, competitive sampling strategy in a parallel computing setting.
引用
收藏
页码:224 / 236
页数:13
相关论文
共 50 条
  • [21] Memory-enriched big bang–big crunch optimization algorithm for data clustering
    Kayvan Bijari
    Hadi Zare
    Hadi Veisi
    Hossein Bobarshad
    Neural Computing and Applications, 2018, 29 : 111 - 121
  • [22] Stochastic limited memory bundle algorithm for clustering in big data
    Karmitsa, Napsu
    Eronen, Ville-Pekka
    Maekelae, Marko M.
    Pahikkala, Tapio
    Airola, Antti
    PATTERN RECOGNITION, 2025, 165
  • [23] Parallel and distributed clustering framework for big spatial data mining
    Bendechache, Malika
    Tari, A-Kamel
    Kechadi, M-Tahar
    INTERNATIONAL JOURNAL OF PARALLEL EMERGENT AND DISTRIBUTED SYSTEMS, 2019, 34 (06) : 671 - 689
  • [24] Parallel Clustering of Big Data of Spatio-temporal Trajectory
    Hu, Chunchun
    Kang, Xionghua
    Luo, Nianxue
    Zhao, Qiansheng
    2015 11TH INTERNATIONAL CONFERENCE ON NATURAL COMPUTATION (ICNC), 2015, : 769 - 774
  • [25] Improved K-Means Clustering Algorithm for Big Data Mining under Hadoop Parallel Framework
    Lu, Weijia
    JOURNAL OF GRID COMPUTING, 2020, 18 (02) : 239 - 250
  • [26] Parallel Fuzzy C-Means Clustering Based Big Data Anonymization Using Hadoop MapReduce
    Lawrance, Josephine Usha
    Jesudhasan, Jesu Vedha Nayahi
    Rittammal, Jerald Beno Thampiraj
    WIRELESS PERSONAL COMMUNICATIONS, 2024, 135 (04) : 2103 - 2130
  • [27] Improved K-Means Clustering Algorithm for Big Data Mining under Hadoop Parallel Framework
    Weijia Lu
    Journal of Grid Computing, 2020, 18 : 239 - 250
  • [28] Parallel Selective Algorithms for Nonconvex Big Data Optimization
    Facchinei, Francisco
    Scutari, Gesualdo
    Sagratella, Simone
    IEEE TRANSACTIONS ON SIGNAL PROCESSING, 2015, 63 (07) : 1874 - 1889
  • [29] Parallel coordinate descent methods for big data optimization
    Richtarik, Peter
    Takac, Martin
    MATHEMATICAL PROGRAMMING, 2016, 156 (1-2) : 433 - 484
  • [30] Parallel coordinate descent methods for big data optimization
    Peter Richtárik
    Martin Takáč
    Mathematical Programming, 2016, 156 : 433 - 484