Big Data Clustering with Kernel k-Means: Resources, Time and Performance

被引:2
|
作者
Tsapanos, Nikolaos [1 ]
Tefas, Anastasios [1 ]
Nikolaidis, Nikolaos [1 ]
Pitas, Ioannis [1 ]
机构
[1] Aristotle Univ Thessaloniki, Dept Informat, Univ Campus,Box 54 124, Thessaloniki, Greece
关键词
Big data; kernel k-means; data clustering; approximate kernel k-means; Apache Spark; distributed computation; COMPUTATION; HISTOGRAMS;
D O I
10.1142/S0218213018600060
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Data clustering is an unsupervised learning task that has found many applications in various scientific fields. The goal is to find subgroups of closely related data samples (clusters) in a set of unlabeled data. A classic clustering algorithm is the so-called k-Means. It is very popular, however, it is also unable to handle cases in which the clusters are not linearly separable. Kernel k-Means is a state of the art clustering algorithm, which employs the kernel trick, in order to perform clustering on a higher dimensionality space, thus overcoming the limitations of classic k-Means regarding the non-linear separability of the input data. With respect to the challenges of Big Data research, a field that has established itself in the last few years and involves performing tasks on extremely large amounts of data, several adaptations of the Kernel k-Means have been proposed, each of which has different requirements in processing power and running time, while also incurring different trade-offs in performance. In this paper, we present several issues and techniques involving the usage of Kernel k-Means for Big Data clustering and how the combination of each component in a clustering framework fares in terms of resources, time and performance. We use experimental results, in order to evaluate several combinations and provide a recommendation on how to approach a Big Data clustering problem.
引用
收藏
页数:18
相关论文
共 50 条
  • [1] Efficient MapReduce Kernel k-Means for Big Data Clustering
    Tsapanos, Nikolaos
    Tefas, Anastasios
    Nikolaidis, Nikolaos
    Pitas, Ioannis
    [J]. 9TH HELLENIC CONFERENCE ON ARTIFICIAL INTELLIGENCE (SETN 2016), 2016,
  • [2] k-Means Clustering of Lines for Big Data
    Marom, Yair
    Feldman, Dan
    [J]. ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 32 (NIPS 2019), 2019, 32
  • [3] Optimized Data Fusion for Kernel k-Means Clustering
    Yu, Shi
    Tranchevent, Leon-Charles
    Liu, Xinhai
    Glanzel, Wolfgang
    Suykens, Johan A. K.
    De Moor, Bart
    Moreau, Yves
    [J]. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2012, 34 (05) : 1031 - 1039
  • [4] How to Use K-means for Big Data Clustering?
    Mussabayev, Rustam
    Mladenovic, Nenad
    Jarboui, Bassem
    Mussabayev, Ravil
    [J]. PATTERN RECOGNITION, 2023, 137
  • [5] Modified K-means Algorithm for Big Data Clustering
    Sengupta, Debapriya
    Roy, Sayantan Singha
    Ghosh, Sarbani
    Dasgupta, Ranjan
    [J]. PROCEEDINGS 2017 INTERNATIONAL CONFERENCE ON COMPUTATIONAL SCIENCE AND COMPUTATIONAL INTELLIGENCE (CSCI), 2017, : 1443 - 1448
  • [6] Parallel batch k-means for Big data clustering
    Alguliyev, Rasim M.
    Aliguliyev, Ramiz M.
    Sukhostat, Lyudmila, V
    [J]. COMPUTERS & INDUSTRIAL ENGINEERING, 2021, 152
  • [7] Kernel Probabilistic K-Means Clustering
    Liu, Bowen
    Zhang, Ting
    Li, Yujian
    Liu, Zhaoying
    Zhang, Zhilin
    [J]. SENSORS, 2021, 21 (05) : 1 - 16
  • [8] Sparse kernel k-means clustering
    Park, Beomjin
    Park, Changyi
    Hong, Sungchul
    Choi, Hosik
    [J]. JOURNAL OF APPLIED STATISTICS, 2024,
  • [9] A Kernel K-means Clustering Method for Symbolic Interval Data
    Costa, Anderson F. B. F.
    Pimentel, Bruno A.
    de Souza, Renata M. C. R.
    [J]. 2010 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS IJCNN 2010, 2010,
  • [10] DYNAMIC TIME-ALIGNMENT K-MEANS KERNEL CLUSTERING FOR TIME SEQUENCE CLUSTERING
    Santarcangelo, Joseph
    Zhang, Xiao-Ping
    [J]. 2015 IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING (ICIP), 2015, : 2532 - 2536