A Data Science and Engineering Solution for Fast k-Means Clustering of Big Data

被引：39

作者：

Dierckens, Karl E. ^{[1
]}

Harrison, Adrian B. ^{[1
]}

Leung, Carson K. ^{[1
]}

Pind, Adrienne V. ^{[1
]}

机构：

[1] Univ Manitoba, Dept Comp Sci, Winnipeg, MB, Canada

来源：

2017 16TH IEEE INTERNATIONAL CONFERENCE ON TRUST, SECURITY AND PRIVACY IN COMPUTING AND COMMUNICATIONS / 11TH IEEE INTERNATIONAL CONFERENCE ON BIG DATA SCIENCE AND ENGINEERING / 14TH IEEE INTERNATIONAL CONFERENCE ON EMBEDDED SOFTWARE AND SYSTEMS | 2017年

基金：

加拿大自然科学与工程研究理事会;

关键词：

Big data; data mining; clustering; k-means; IMPLEMENTATION; ALGORITHM;

D O I：

10.1109/Trustcom/BigDataSE/ICESS.2017.332

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

With advances in technology, high volumes of a wide variety of valuable data of different veracity can be easily collected or generated at a high velocity in the current era of big data. Embedded in these big data arc implicit, previously unknown and potentially useful information and knowledge. Hence, fast and scalable big data science and engineering solutions that mine and discover knowledge from these big data are in demand. A popular and practical data mining task is to group similar data or objects into clusters (i.e., clustering). While k-means clustering is popular and leads good-quality results, its associated algorithms may suffer from a few problems (e.g., risks associated with randomly selected k representatives, tendency to produce spherical clusters, high runtime complexity). To deal with these problems, we present in this paper a fast big data science and engineering solution that applies a fast k-means clustering heuristic for grouping similar big data objects. Evaluation results show the efficiency and scalability of our solution in k-means clustering of big data.

引用

页码：925 / 932

页数：8

共 50 条

[21] Clustering of Image Data Using K-Means and Fuzzy K-Means
Rahmani, Md. Khalid Imam
Pal, Naina
Arora, Kamiya
INTERNATIONAL JOURNAL OF ADVANCED COMPUTER SCIENCE AND APPLICATIONS, 2014, 5 (07) : 160 - 163
[22] HdK-Means: Hadoop Based Parallel K-Means Clustering for Big Data
Bandyopadhyay, Soumyendu Sekhar
Halder, Anup Kumar
Chatterjee, Piyali
Nasipuri, Mita
Basu, Subhadip
2017 IEEE CALCUTTA CONFERENCE (CALCON), 2017, : 452 - 456
[23] A fast K-Means clustering algorithm based on grid data reduction
Li, Daqi
Shen, Junyi
Chen, Hongmin
2008 IEEE AEROSPACE CONFERENCE, VOLS 1-9, 2008, : 2273 - +
[24] Fast support vector data description using K-means clustering
Kim, Pyo Jae
Chang, Hyung Jin
Song, Dong Sung
Choi, Jin Young
ADVANCES IN NEURAL NETWORKS - ISNN 2007, PT 3, PROCEEDINGS, 2007, 4493 : 506 - +
[25] IMPROVEMENT IN K-MEANS CLUSTERING ALGORITHM FOR DATA CLUSTERING
Rajeswari, K.
Acharya, Omkar
Sharma, Mayur
Kopnar, Mahesh
Karandikar, Kiran
1ST INTERNATIONAL CONFERENCE ON COMPUTING COMMUNICATION CONTROL AND AUTOMATION ICCUBEA 2015, 2015, : 367 - 369
[26] Big Data Clustering Analysis Algorithm for Internet of Things Based on K-Means
Yu, Zhanqiu
INTERNATIONAL JOURNAL OF DISTRIBUTED SYSTEMS AND TECHNOLOGIES, 2019, 10 (01) : 1 - 12
[27] Balancing effort and benefit of K-means clustering algorithms in Big Data realms
Perez-Ortega, Joaquin
Nely Almanza-Ortega, Nelva
Romero, David
PLOS ONE, 2018, 13 (09):
[28] Efficient and Privacy-Preserving k-means clustering For Big Data Mining
Gheid, Zakaria
Challal, Yacine
2016 IEEE TRUSTCOM/BIGDATASE/ISPA, 2016, : 791 - 798
[29] Enhancement of K-means clustering in big data based on equilibrium optimizer algorithm
Al-kababchee, Sarah Ghanim Mahmood
Algamal, Zakariya Yahya
Qasim, Omar Saber
JOURNAL OF INTELLIGENT SYSTEMS, 2023, 32 (01)
[30] Study on oceanic big data clustering based on incremental K-means algorithm
Li Y.
Yang Z.
Han K.
International Journal of Innovative Computing and Applications, 2020, 11 (2-3) : 89 - 95

← 1 2 3 4 5 →