Superior Parallel Big Data Clustering Through Competitive Stochastic Sample Size Optimization in Big-Means

被引：2

作者：

Mussabayev, Rustam ^{[1
,2
]}

Mussabayev, Ravil ^{[1
,3
]}

机构：

[1] Satbayev Univ, Satbayev St 22, Alma Ata 050013, Kazakhstan

[2] Inst Informat & Computat Technol, Lab Anal & Modeling Informat Proc, Pushkin St 125, Alma Ata 050010, Kazakhstan

[3] Univ Washington, Dept Math, Padelford Hall C-138, Seattle, WA 98195 USA

来源：

INTELLIGENT INFORMATION AND DATABASE SYSTEMS, PT II, ACIIDS 2024 | 2024年 / 14796卷

关键词：

Big-means Clustering; Parallel Computing; Data Mining; Stochastic Variation; Sample Size; Competitive Environment; Parallelization Strategy; Machine Learning; Big Data Analysis; Optimization; Cluster Analysis; K-means; K-means plus; Unsupervised Learning;

D O I：

10.1007/978-981-97-4985-0_18

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

This paper introduces a novel K-means clustering algorithm, an advancement on the conventional Big-means methodology. The proposed method efficiently integrates parallel processing, stochastic sampling, and competitive optimization to create a scalable variant designed for big data applications. It addresses scalability and computation time challenges typically faced with traditional techniques. The algorithm adjusts sample sizes dynamically for each worker during execution, optimizing performance. Data from these sample sizes are continually analyzed, facilitating the identification of the most efficient configuration. By incorporating a competitive element among workers using different sample sizes, efficiency within the Big-means algorithm is further stimulated. In essence, the algorithm balances computational time and clustering quality by employing a stochastic, competitive sampling strategy in a parallel computing setting.

引用

页码：224 / 236

页数：13

共 50 条

[1] Parallel batch k-means for Big data clustering
Alguliyev, Rasim M.
Aliguliyev, Ramiz M.
Sukhostat, Lyudmila, V
COMPUTERS & INDUSTRIAL ENGINEERING, 2021, 152
[2] Parallel Clustering Optimization Algorithm Based on MapReduce in Big Data Mining
Zhang, Huajie
Song, Lei
Zhang, Sen
IAENG International Journal of Applied Mathematics, 2023, 53 (01):
[3] HdK-Means: Hadoop Based Parallel K-Means Clustering for Big Data
Bandyopadhyay, Soumyendu Sekhar
Halder, Anup Kumar
Chatterjee, Piyali
Nasipuri, Mita
Basu, Subhadip
2017 IEEE CALCUTTA CONFERENCE (CALCON), 2017, : 452 - 456
[4] A survey on parallel clustering algorithms for Big Data
Zineb Dafir
Yasmine Lamari
Said Chah Slaoui
Artificial Intelligence Review, 2021, 54 : 2411 - 2443
[5] A survey on parallel clustering algorithms for Big Data
Dafir, Zineb
Lamari, Yasmine
Slaoui, Said Chah
ARTIFICIAL INTELLIGENCE REVIEW, 2021, 54 (04) : 2411 - 2443
[6] Optimization of Big Data Parallel Scheduling Based on Dynamic Clustering Scheduling Algorithm
Liu, Fang
He, Yanxiang
He, Jing
Gao, Xing
Huang, Feihu
Journal of Signal Processing Systems, 2022, 94 (11) : 1243 - 1251
[7] Optimization of Big Data Parallel Scheduling Based on Dynamic Clustering Scheduling Algorithm
Liu, Fang
He, Yanxiang
He, Jing
Gao, Xing
Huang, Feihu
JOURNAL OF SIGNAL PROCESSING SYSTEMS FOR SIGNAL IMAGE AND VIDEO TECHNOLOGY, 2022, 94 (11): : 1243 - 1251
[8] Optimization of Big Data Parallel Scheduling Based on Dynamic Clustering Scheduling Algorithm
Fang Liu
Yanxiang He
Jing He
Xing Gao
Feihu Huang
Journal of Signal Processing Systems, 2022, 94 : 1243 - 1251
[9] k-Means Clustering of Lines for Big Data
Marom, Yair
Feldman, Dan
ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 32 (NIPS 2019), 2019, 32
[10] A Parallel Clustering Algorithm for Power Big Data Analysis
Meng, Xiangjun
Chen, Liang
Li, Yidong
PARALLEL ARCHITECTURE, ALGORITHM AND PROGRAMMING, PAAP 2017, 2017, 729 : 533 - 540

← 1 2 3 4 5 →