A Data Science and Engineering Solution for Fast k-Means Clustering of Big Data

被引：39

作者：

Dierckens, Karl E. ^{[1
]}

Harrison, Adrian B. ^{[1
]}

Leung, Carson K. ^{[1
]}

Pind, Adrienne V. ^{[1
]}

机构：

[1] Univ Manitoba, Dept Comp Sci, Winnipeg, MB, Canada

来源：

2017 16TH IEEE INTERNATIONAL CONFERENCE ON TRUST, SECURITY AND PRIVACY IN COMPUTING AND COMMUNICATIONS / 11TH IEEE INTERNATIONAL CONFERENCE ON BIG DATA SCIENCE AND ENGINEERING / 14TH IEEE INTERNATIONAL CONFERENCE ON EMBEDDED SOFTWARE AND SYSTEMS | 2017年

基金：

加拿大自然科学与工程研究理事会;

关键词：

Big data; data mining; clustering; k-means; IMPLEMENTATION; ALGORITHM;

D O I：

10.1109/Trustcom/BigDataSE/ICESS.2017.332

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

With advances in technology, high volumes of a wide variety of valuable data of different veracity can be easily collected or generated at a high velocity in the current era of big data. Embedded in these big data arc implicit, previously unknown and potentially useful information and knowledge. Hence, fast and scalable big data science and engineering solutions that mine and discover knowledge from these big data are in demand. A popular and practical data mining task is to group similar data or objects into clusters (i.e., clustering). While k-means clustering is popular and leads good-quality results, its associated algorithms may suffer from a few problems (e.g., risks associated with randomly selected k representatives, tendency to produce spherical clusters, high runtime complexity). To deal with these problems, we present in this paper a fast big data science and engineering solution that applies a fast k-means clustering heuristic for grouping similar big data objects. Evaluation results show the efficiency and scalability of our solution in k-means clustering of big data.

引用

页码：925 / 932

页数：8

共 50 条

[1] The fast clustering algorithm for the big data based on K-means
Xie, Ting
Zhang, Taiping
INTERNATIONAL JOURNAL OF WAVELETS MULTIRESOLUTION AND INFORMATION PROCESSING, 2020, 18 (06)
[2] Improvement of the Fast Clustering Algorithm Improved by K-Means in the Big Data
Xie, Ting
Liu, Ruihua
Wei, Zhengyuan
APPLIED MATHEMATICS AND NONLINEAR SCIENCES, 2020, 5 (01) : 1 - 10
[3] k-Means Clustering of Lines for Big Data
Marom, Yair
Feldman, Dan
ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 32 (NIPS 2019), 2019, 32
[4] How to Use K-means for Big Data Clustering?
Mussabayev, Rustam
Mladenovic, Nenad
Jarboui, Bassem
Mussabayev, Ravil
PATTERN RECOGNITION, 2023, 137
[5] Modified K-means Algorithm for Big Data Clustering
Sengupta, Debapriya
Roy, Sayantan Singha
Ghosh, Sarbani
Dasgupta, Ranjan
PROCEEDINGS 2017 INTERNATIONAL CONFERENCE ON COMPUTATIONAL SCIENCE AND COMPUTATIONAL INTELLIGENCE (CSCI), 2017, : 1443 - 1448
[6] Parallel batch k-means for Big data clustering
Alguliyev, Rasim M.
Aliguliyev, Ramiz M.
Sukhostat, Lyudmila, V
COMPUTERS & INDUSTRIAL ENGINEERING, 2021, 152
[7] STiMR k-Means: An Efficient Clustering Method for Big Data
Ben HajKacem, Mohamed Aymen
Ben N'cir, Chiheb-Eddine
Essoussi, Nadia
INTERNATIONAL JOURNAL OF PATTERN RECOGNITION AND ARTIFICIAL INTELLIGENCE, 2019, 33 (08)
[8] Review on the Research of K-means Clustering Algorithm in Big Data
Chen Jie
Zhang Jiyue
Wu Junhui
Wu Yusheng
Si Huiping
Lin Kaiyan
2020 IEEE THE 3RD INTERNATIONAL CONFERENCE ON ELECTRONICS AND COMMUNICATION ENGINEERING (ICECE), 2020, : 107 - 111
[9] Efficient MapReduce Kernel k-Means for Big Data Clustering
Tsapanos, Nikolaos
Tefas, Anastasios
Nikolaidis, Nikolaos
Pitas, Ioannis
9TH HELLENIC CONFERENCE ON ARTIFICIAL INTELLIGENCE (SETN 2016), 2016,
[10] Canopy with k-means Clustering Algorithm for Big Data Analytics
Sagheer, Noor S.
Yousif, Suhad A.
FOURTH INTERNATIONAL CONFERENCE OF MATHEMATICAL SCIENCES (ICMS 2020), 2021, 2334

← 1 2 3 4 5 →