A Data Science and Engineering Solution for Fast k-Means Clustering of Big Data

被引:39
|
作者
Dierckens, Karl E. [1 ]
Harrison, Adrian B. [1 ]
Leung, Carson K. [1 ]
Pind, Adrienne V. [1 ]
机构
[1] Univ Manitoba, Dept Comp Sci, Winnipeg, MB, Canada
基金
加拿大自然科学与工程研究理事会;
关键词
Big data; data mining; clustering; k-means; IMPLEMENTATION; ALGORITHM;
D O I
10.1109/Trustcom/BigDataSE/ICESS.2017.332
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
With advances in technology, high volumes of a wide variety of valuable data of different veracity can be easily collected or generated at a high velocity in the current era of big data. Embedded in these big data arc implicit, previously unknown and potentially useful information and knowledge. Hence, fast and scalable big data science and engineering solutions that mine and discover knowledge from these big data are in demand. A popular and practical data mining task is to group similar data or objects into clusters (i.e., clustering). While k-means clustering is popular and leads good-quality results, its associated algorithms may suffer from a few problems (e.g., risks associated with randomly selected k representatives, tendency to produce spherical clusters, high runtime complexity). To deal with these problems, we present in this paper a fast big data science and engineering solution that applies a fast k-means clustering heuristic for grouping similar big data objects. Evaluation results show the efficiency and scalability of our solution in k-means clustering of big data.
引用
收藏
页码:925 / 932
页数:8
相关论文
共 50 条
  • [21] Clustering of Image Data Using K-Means and Fuzzy K-Means
    Rahmani, Md. Khalid Imam
    Pal, Naina
    Arora, Kamiya
    INTERNATIONAL JOURNAL OF ADVANCED COMPUTER SCIENCE AND APPLICATIONS, 2014, 5 (07) : 160 - 163
  • [22] HdK-Means: Hadoop Based Parallel K-Means Clustering for Big Data
    Bandyopadhyay, Soumyendu Sekhar
    Halder, Anup Kumar
    Chatterjee, Piyali
    Nasipuri, Mita
    Basu, Subhadip
    2017 IEEE CALCUTTA CONFERENCE (CALCON), 2017, : 452 - 456
  • [23] A fast K-Means clustering algorithm based on grid data reduction
    Li, Daqi
    Shen, Junyi
    Chen, Hongmin
    2008 IEEE AEROSPACE CONFERENCE, VOLS 1-9, 2008, : 2273 - +
  • [24] Fast support vector data description using K-means clustering
    Kim, Pyo Jae
    Chang, Hyung Jin
    Song, Dong Sung
    Choi, Jin Young
    ADVANCES IN NEURAL NETWORKS - ISNN 2007, PT 3, PROCEEDINGS, 2007, 4493 : 506 - +
  • [25] IMPROVEMENT IN K-MEANS CLUSTERING ALGORITHM FOR DATA CLUSTERING
    Rajeswari, K.
    Acharya, Omkar
    Sharma, Mayur
    Kopnar, Mahesh
    Karandikar, Kiran
    1ST INTERNATIONAL CONFERENCE ON COMPUTING COMMUNICATION CONTROL AND AUTOMATION ICCUBEA 2015, 2015, : 367 - 369
  • [26] Big Data Clustering Analysis Algorithm for Internet of Things Based on K-Means
    Yu, Zhanqiu
    INTERNATIONAL JOURNAL OF DISTRIBUTED SYSTEMS AND TECHNOLOGIES, 2019, 10 (01) : 1 - 12
  • [27] Balancing effort and benefit of K-means clustering algorithms in Big Data realms
    Perez-Ortega, Joaquin
    Nely Almanza-Ortega, Nelva
    Romero, David
    PLOS ONE, 2018, 13 (09):
  • [28] Efficient and Privacy-Preserving k-means clustering For Big Data Mining
    Gheid, Zakaria
    Challal, Yacine
    2016 IEEE TRUSTCOM/BIGDATASE/ISPA, 2016, : 791 - 798
  • [29] Enhancement of K-means clustering in big data based on equilibrium optimizer algorithm
    Al-kababchee, Sarah Ghanim Mahmood
    Algamal, Zakariya Yahya
    Qasim, Omar Saber
    JOURNAL OF INTELLIGENT SYSTEMS, 2023, 32 (01)
  • [30] Study on oceanic big data clustering based on incremental K-means algorithm
    Li Y.
    Yang Z.
    Han K.
    International Journal of Innovative Computing and Applications, 2020, 11 (2-3) : 89 - 95