A Data Science and Engineering Solution for Fast k-Means Clustering of Big Data

被引:39
|
作者
Dierckens, Karl E. [1 ]
Harrison, Adrian B. [1 ]
Leung, Carson K. [1 ]
Pind, Adrienne V. [1 ]
机构
[1] Univ Manitoba, Dept Comp Sci, Winnipeg, MB, Canada
基金
加拿大自然科学与工程研究理事会;
关键词
Big data; data mining; clustering; k-means; IMPLEMENTATION; ALGORITHM;
D O I
10.1109/Trustcom/BigDataSE/ICESS.2017.332
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
With advances in technology, high volumes of a wide variety of valuable data of different veracity can be easily collected or generated at a high velocity in the current era of big data. Embedded in these big data arc implicit, previously unknown and potentially useful information and knowledge. Hence, fast and scalable big data science and engineering solutions that mine and discover knowledge from these big data are in demand. A popular and practical data mining task is to group similar data or objects into clusters (i.e., clustering). While k-means clustering is popular and leads good-quality results, its associated algorithms may suffer from a few problems (e.g., risks associated with randomly selected k representatives, tendency to produce spherical clusters, high runtime complexity). To deal with these problems, we present in this paper a fast big data science and engineering solution that applies a fast k-means clustering heuristic for grouping similar big data objects. Evaluation results show the efficiency and scalability of our solution in k-means clustering of big data.
引用
收藏
页码:925 / 932
页数:8
相关论文
共 50 条
  • [31] A K-Means Algorithm Application on Big Data
    Eren, Beste
    Karabulut, Ezgi Cilga
    Alptekin, S. Emre
    Alptekin, Gulfem Isiklar
    WORLD CONGRESS ON ENGINEERING AND COMPUTER SCIENCE, WCECS 2015, VOL II, 2015, : 814 - 818
  • [32] An improved K-means algorithm for big data
    Moodi, Fatemeh
    Saadatfar, Hamid
    IET SOFTWARE, 2022, 16 (01) : 48 - 59
  • [33] Bootstrapping K-means for Big data analysis
    Han, Jungkyu
    Luo, Min
    2014 IEEE INTERNATIONAL CONFERENCE ON BIG DATA (BIG DATA), 2014, : 591 - 596
  • [34] Soil data clustering by using K-means and fuzzy K-means algorithm
    Hot, Elma
    Popovic-Bugarin, Vesna
    2015 23RD TELECOMMUNICATIONS FORUM TELFOR (TELFOR), 2015, : 890 - 893
  • [35] Bagged K-means clustering of metabolome data
    Hageman, J. A.
    van den Berg, R. A.
    Westerhuis, J. A.
    Hoefsloot, H. C. J.
    Smilde, A. K.
    CRITICAL REVIEWS IN ANALYTICAL CHEMISTRY, 2006, 36 (3-4) : 211 - 220
  • [36] K-means Data Clustering with Memristor Networks
    Jeong, YeonJoo
    Lee, Jihang
    Moon, John
    Shin, Jong Hoon
    Lu, Wei D.
    NANO LETTERS, 2018, 18 (07) : 4447 - 4453
  • [37] K-Means Extensions for Clustering Categorical Data
    Alwersh, Mohammed
    Kovacs, Laszlo
    INTERNATIONAL JOURNAL OF ADVANCED COMPUTER SCIENCE AND APPLICATIONS, 2023, 14 (09) : 492 - 507
  • [38] New k-Means data clustering approach
    College of Computer Science and Technology, Henan Polytechnic University, Jiaozuo 454000, China
    不详
    不详
    J. Comput. Inf. Syst., 2008, 2 (565-570):
  • [39] K-means*: Clustering by gradual data transformation
    Malinen, Mikko I.
    Mariescu-Istodor, Radu
    Franti, Pasi
    PATTERN RECOGNITION, 2014, 47 (10) : 3376 - 3386
  • [40] A Fast and Scalable FPGA-Based Parallel Processing Architecture for K-Means Clustering for Big Data Analysis
    Raghavan, Ramprasad
    Perera, Darshika G.
    2017 IEEE PACIFIC RIM CONFERENCE ON COMMUNICATIONS, COMPUTERS AND SIGNAL PROCESSING (PACRIM), 2017,