Superior Parallel Big Data Clustering Through Competitive Stochastic Sample Size Optimization in Big-Means

被引:2
|
作者
Mussabayev, Rustam [1 ,2 ]
Mussabayev, Ravil [1 ,3 ]
机构
[1] Satbayev Univ, Satbayev St 22, Alma Ata 050013, Kazakhstan
[2] Inst Informat & Computat Technol, Lab Anal & Modeling Informat Proc, Pushkin St 125, Alma Ata 050010, Kazakhstan
[3] Univ Washington, Dept Math, Padelford Hall C-138, Seattle, WA 98195 USA
关键词
Big-means Clustering; Parallel Computing; Data Mining; Stochastic Variation; Sample Size; Competitive Environment; Parallelization Strategy; Machine Learning; Big Data Analysis; Optimization; Cluster Analysis; K-means; K-means plus; Unsupervised Learning;
D O I
10.1007/978-981-97-4985-0_18
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
This paper introduces a novel K-means clustering algorithm, an advancement on the conventional Big-means methodology. The proposed method efficiently integrates parallel processing, stochastic sampling, and competitive optimization to create a scalable variant designed for big data applications. It addresses scalability and computation time challenges typically faced with traditional techniques. The algorithm adjusts sample sizes dynamically for each worker during execution, optimizing performance. Data from these sample sizes are continually analyzed, facilitating the identification of the most efficient configuration. By incorporating a competitive element among workers using different sample sizes, efficiency within the Big-means algorithm is further stimulated. In essence, the algorithm balances computational time and clustering quality by employing a stochastic, competitive sampling strategy in a parallel computing setting.
引用
收藏
页码:224 / 236
页数:13
相关论文
共 50 条
  • [31] Flexible Selective Parallel Algorithms for Big Data Optimization
    Daneshmand, Amir
    Facchinei, Francisco
    Kungurtsev, Vyacheslav
    Scutari, Gesualdo
    CONFERENCE RECORD OF THE 2014 FORTY-EIGHTH ASILOMAR CONFERENCE ON SIGNALS, SYSTEMS & COMPUTERS, 2014, : 3 - 7
  • [32] Memory-enriched big bang-big crunch optimization algorithm for data clustering
    Bijari, Kayvan
    Zare, Hadi
    Veisi, Hadi
    Bobarshad, Hossein
    NEURAL COMPUTING & APPLICATIONS, 2018, 29 (06): : 111 - 121
  • [33] Sample size determination for biomedical big data with limited labels
    Richter, Aaron N.
    Khoshgoftaar, Taghi M.
    NETWORK MODELING AND ANALYSIS IN HEALTH INFORMATICS AND BIOINFORMATICS, 2020, 9 (01):
  • [34] Big Data Analytics Model for Distributed Document Using Hybrid Optimization with K-Means Clustering
    Sharma, Kapil
    Saini, Satish
    Sharma, Shailja
    Kang, Hardeep Singh
    Bouye, Mohamed
    Krah, Daniel
    WIRELESS COMMUNICATIONS & MOBILE COMPUTING, 2022, 2022
  • [35] Sample size determination for biomedical big data with limited labels
    Aaron N. Richter
    Taghi M. Khoshgoftaar
    Network Modeling Analysis in Health Informatics and Bioinformatics, 2020, 9
  • [36] Achieving Competitive Advantage Through Big Data: A Literature Review
    Barham, Husam
    2017 PORTLAND INTERNATIONAL CONFERENCE ON MANAGEMENT OF ENGINEERING AND TECHNOLOGY (PICMET), 2017,
  • [37] Clustering Algorithm Optimization Applied to Metagenomics Using Big Data
    Vanegas, Julian
    Bonet, Isis
    INFORMATION AND COMMUNICATION TECHNOLOGIES OF ECUADOR (TIC.EC), 2019, 884 : 182 - 192
  • [38] STiMR k-Means: An Efficient Clustering Method for Big Data
    Ben HajKacem, Mohamed Aymen
    Ben N'cir, Chiheb-Eddine
    Essoussi, Nadia
    INTERNATIONAL JOURNAL OF PATTERN RECOGNITION AND ARTIFICIAL INTELLIGENCE, 2019, 33 (08)
  • [39] The fast clustering algorithm for the big data based on K-means
    Xie, Ting
    Zhang, Taiping
    INTERNATIONAL JOURNAL OF WAVELETS MULTIRESOLUTION AND INFORMATION PROCESSING, 2020, 18 (06)
  • [40] Review on the Research of K-means Clustering Algorithm in Big Data
    Chen Jie
    Zhang Jiyue
    Wu Junhui
    Wu Yusheng
    Si Huiping
    Lin Kaiyan
    2020 IEEE THE 3RD INTERNATIONAL CONFERENCE ON ELECTRONICS AND COMMUNICATION ENGINEERING (ICECE), 2020, : 107 - 111