A Data Science and Engineering Solution for Fast k-Means Clustering of Big Data

被引：39

作者：

Dierckens, Karl E. ^{[1
]}

Harrison, Adrian B. ^{[1
]}

Leung, Carson K. ^{[1
]}

Pind, Adrienne V. ^{[1
]}

机构：

[1] Univ Manitoba, Dept Comp Sci, Winnipeg, MB, Canada

来源：

2017 16TH IEEE INTERNATIONAL CONFERENCE ON TRUST, SECURITY AND PRIVACY IN COMPUTING AND COMMUNICATIONS / 11TH IEEE INTERNATIONAL CONFERENCE ON BIG DATA SCIENCE AND ENGINEERING / 14TH IEEE INTERNATIONAL CONFERENCE ON EMBEDDED SOFTWARE AND SYSTEMS | 2017年

基金：

加拿大自然科学与工程研究理事会;

关键词：

Big data; data mining; clustering; k-means; IMPLEMENTATION; ALGORITHM;

D O I：

10.1109/Trustcom/BigDataSE/ICESS.2017.332

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

With advances in technology, high volumes of a wide variety of valuable data of different veracity can be easily collected or generated at a high velocity in the current era of big data. Embedded in these big data arc implicit, previously unknown and potentially useful information and knowledge. Hence, fast and scalable big data science and engineering solutions that mine and discover knowledge from these big data are in demand. A popular and practical data mining task is to group similar data or objects into clusters (i.e., clustering). While k-means clustering is popular and leads good-quality results, its associated algorithms may suffer from a few problems (e.g., risks associated with randomly selected k representatives, tendency to produce spherical clusters, high runtime complexity). To deal with these problems, we present in this paper a fast big data science and engineering solution that applies a fast k-means clustering heuristic for grouping similar big data objects. Evaluation results show the efficiency and scalability of our solution in k-means clustering of big data.

引用

页码：925 / 932

页数：8

共 50 条

[41] Data decomposition for parallel K-means clustering
Gursoy, A
PARALLEL PROCESSING AND APPLIED MATHEMATICS, 2004, 3019 : 241 - 248
[42] Fast Adaptive K-Means Subspace Clustering for High-Dimensional Data
Wang, Xiao-Dong
Chen, Rung-Ching
Yan, Fei
Zeng, Zhi-Qiang
Hong, Chao-Qun
IEEE ACCESS, 2019, 7 : 42639 - 42651
[43] Analysis of big data job requirements based on K-means text clustering in China
Debao, Dai
Yinxia, Ma
Min, Zhao
PLOS ONE, 2021, 16 (08):
[44] Research on Error Calibration Method for Power Big Data Based on K-Means Clustering
Xing, Wei
Wu, Botao
Liang, Mingyuan
Li, Yue
Cheng, Lin
2022 9TH INTERNATIONAL FORUM ON ELECTRICAL ENGINEERING AND AUTOMATION, IFEEA, 2022, : 679 - 682
[45] Entropy and sigmoid based K-means clustering and AGWO for effective big data handling
Vankdothu, Ramdas
Hameed, Mohd Abdul
Bhukya, Raju
Garg, Gaurav
MULTIMEDIA TOOLS AND APPLICATIONS, 2023, 82 (10) : 15287 - 15304
[46] K-Means Parallel Algorithm of Big Data Clustering Based on Mapreduce PCAM Method
Li, Yongyi
Yang, Zhongqiang
Han, Kaixu
Engineering Intelligent Systems, 2021, 29 (06): : 411 - 418
[47] Enhancement of the K-Means Algorithm for Mixed Data in Big Data Platforms
Koren, Oded
Hallin, Carina Antonia
Perel, Nir
Bendet, Dror
INTELLIGENT SYSTEMS AND APPLICATIONS, VOL 1, 2019, 868 : 1025 - 1040
[48] Preconditioned Data Sparsification for Big Data With Applications to PCA and K-Means
Pourkamali-Anaraki, Farhad
Becker, Stephen
IEEE TRANSACTIONS ON INFORMATION THEORY, 2017, 63 (05) : 2954 - 2974
[49] Deterministic Coresets for k-Means of Big Sparse Data
Barger, Artem
Feldman, Dan
ALGORITHMS, 2020, 13 (04)
[50] Using K-Means Clustering and Data Visualization for Monetizing logistics Data
Qabbaah, Hamzah
Sammour, George
Vanhoof, Koen
2019 2ND INTERNATIONAL CONFERENCE ON NEW TRENDS IN COMPUTING SCIENCES (ICTCS), 2019, : 164 - 169

← 1 2 3 4 5 →