A Data Science and Engineering Solution for Fast k-Means Clustering of Big Data

被引：39

作者：

Dierckens, Karl E. ^{[1
]}

Harrison, Adrian B. ^{[1
]}

Leung, Carson K. ^{[1
]}

Pind, Adrienne V. ^{[1
]}

机构：

[1] Univ Manitoba, Dept Comp Sci, Winnipeg, MB, Canada

来源：

2017 16TH IEEE INTERNATIONAL CONFERENCE ON TRUST, SECURITY AND PRIVACY IN COMPUTING AND COMMUNICATIONS / 11TH IEEE INTERNATIONAL CONFERENCE ON BIG DATA SCIENCE AND ENGINEERING / 14TH IEEE INTERNATIONAL CONFERENCE ON EMBEDDED SOFTWARE AND SYSTEMS | 2017年

基金：

加拿大自然科学与工程研究理事会;

关键词：

Big data; data mining; clustering; k-means; IMPLEMENTATION; ALGORITHM;

D O I：

10.1109/Trustcom/BigDataSE/ICESS.2017.332

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

With advances in technology, high volumes of a wide variety of valuable data of different veracity can be easily collected or generated at a high velocity in the current era of big data. Embedded in these big data arc implicit, previously unknown and potentially useful information and knowledge. Hence, fast and scalable big data science and engineering solutions that mine and discover knowledge from these big data are in demand. A popular and practical data mining task is to group similar data or objects into clusters (i.e., clustering). While k-means clustering is popular and leads good-quality results, its associated algorithms may suffer from a few problems (e.g., risks associated with randomly selected k representatives, tendency to produce spherical clusters, high runtime complexity). To deal with these problems, we present in this paper a fast big data science and engineering solution that applies a fast k-means clustering heuristic for grouping similar big data objects. Evaluation results show the efficiency and scalability of our solution in k-means clustering of big data.

引用

页码：925 / 932

页数：8

共 50 条

[31] A K-Means Algorithm Application on Big Data
Eren, Beste
Karabulut, Ezgi Cilga
Alptekin, S. Emre
Alptekin, Gulfem Isiklar
WORLD CONGRESS ON ENGINEERING AND COMPUTER SCIENCE, WCECS 2015, VOL II, 2015, : 814 - 818
[32] An improved K-means algorithm for big data
Moodi, Fatemeh
Saadatfar, Hamid
IET SOFTWARE, 2022, 16 (01) : 48 - 59
[33] Bootstrapping K-means for Big data analysis
Han, Jungkyu
Luo, Min
2014 IEEE INTERNATIONAL CONFERENCE ON BIG DATA (BIG DATA), 2014, : 591 - 596
[34] Soil data clustering by using K-means and fuzzy K-means algorithm
Hot, Elma
Popovic-Bugarin, Vesna
2015 23RD TELECOMMUNICATIONS FORUM TELFOR (TELFOR), 2015, : 890 - 893
[35] Bagged K-means clustering of metabolome data
Hageman, J. A.
van den Berg, R. A.
Westerhuis, J. A.
Hoefsloot, H. C. J.
Smilde, A. K.
CRITICAL REVIEWS IN ANALYTICAL CHEMISTRY, 2006, 36 (3-4) : 211 - 220
[36] K-means Data Clustering with Memristor Networks
Jeong, YeonJoo
Lee, Jihang
Moon, John
Shin, Jong Hoon
Lu, Wei D.
NANO LETTERS, 2018, 18 (07) : 4447 - 4453
[37] K-Means Extensions for Clustering Categorical Data
Alwersh, Mohammed
Kovacs, Laszlo
INTERNATIONAL JOURNAL OF ADVANCED COMPUTER SCIENCE AND APPLICATIONS, 2023, 14 (09) : 492 - 507
[38] New k-Means data clustering approach
College of Computer Science and Technology, Henan Polytechnic University, Jiaozuo 454000, China
不详
不详
J. Comput. Inf. Syst., 2008, 2 (565-570):
[39] K-means*: Clustering by gradual data transformation
Malinen, Mikko I.
Mariescu-Istodor, Radu
Franti, Pasi
PATTERN RECOGNITION, 2014, 47 (10) : 3376 - 3386
[40] A Fast and Scalable FPGA-Based Parallel Processing Architecture for K-Means Clustering for Big Data Analysis
Raghavan, Ramprasad
Perera, Darshika G.
2017 IEEE PACIFIC RIM CONFERENCE ON COMMUNICATIONS, COMPUTERS AND SIGNAL PROCESSING (PACRIM), 2017,

← 1 2 3 4 5 →