A Comparative Performance Analysis of Fast K-Means Clustering Algorithms

被引:0
|
作者
Beecks, Christian [1 ]
Berns, Fabian [1 ]
Huewel, Jan David [1 ]
Linxen, Andrea [1 ]
Schlake, Georg Stefan [1 ]
Duesterhus, Tim [2 ]
机构
[1] Univ Hagen, Hagen, Germany
[2] Univ Munster, Munster, Germany
关键词
Data mining; Clustering; Performance evaluation;
D O I
10.1007/978-3-031-21047-1_11
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Data clustering is a fundamental and widespread problem in computer science, which has become very attractive in both scientific communities and application domains. Among the different algorithmic methods, the k-means algorithm, and its prominent implementation, the Lloyd algorithm, has developed into a de facto standard for partitioningbased clustering. This algorithm, however, turns out to be inefficient on very large databases. In order to mitigate this efficiency issue, several fast k-means algorithms for ad-hoc and exact data clustering have been proposed in the literature. Since their inner workings and applied pruning criteria differ, it is difficult to predict the efficiency of individual algorithms in certain application scenarios. We thus present a performance analysis of existing fast k-means algorithms. We focus on simple interpretability and comparability and abstract from many implementation details so as to provide a guide for data scientists and practitioners alike.
引用
收藏
页码:119 / 125
页数:7
相关论文
共 50 条
  • [2] Performance Analysis of Parallel K-Means with Optimization Algorithms for Clustering on Spark
    Santhi, V.
    Jose, Rini
    [J]. DISTRIBUTED COMPUTING AND INTERNET TECHNOLOGY (ICDCIT 2018), 2018, 10722 : 158 - 162
  • [3] Comparative Analysis of K-Means with other Clustering Algorithms to Improve Search Result
    Mehrotra, Shashi
    Kohli, Shruti
    [J]. 2015 INTERNATIONAL CONFERENCE ON GREEN COMPUTING AND INTERNET OF THINGS (ICGCIOT), 2015, : 309 - 313
  • [4] A Comparative Study of K-Means, K-Means plus plus and Fuzzy C-Means Clustering Algorithms
    Kapoor, Akanksha
    Singhal, Abhishek
    [J]. 2017 3RD IEEE INTERNATIONAL CONFERENCE ON COMPUTATIONAL INTELLIGENCE & COMMUNICATION TECHNOLOGY (CICT), 2017,
  • [5] COMPARATIVE ANALYSIS OF K-MEANS AND DBSCAN ALGORITHMS
    Zurini, Madalina
    [J]. INTERNATIONAL CONFERENCE ON INFORMATICS IN ECONOMY, 2013, : 646 - 651
  • [6] Finding the k in K-means Clustering: A Comparative Analysis Approach
    Lumpe, Markus
    Quoc Bao Vo
    [J]. AI 2015: ADVANCES IN ARTIFICIAL INTELLIGENCE, 2015, 9457 : 356 - 364
  • [7] Performance Analysis of K-Means Seeding Algorithms
    Ortiz-Bejar, Jose
    Tellez, Eric S.
    Graff, Mario
    Ortiz-Bejar, Jesus
    Jacobo, Jaime Cerda
    Zamora-Mendez, Alejandro
    [J]. 2019 IEEE INTERNATIONAL AUTUMN MEETING ON POWER, ELECTRONICS AND COMPUTING (ROPEC 2019), 2019,
  • [8] A Comparative Study on k-means Clustering Method and Analysis
    Baruri, Rajdeep
    Ghosh, Anannya
    Chanda, Saikat
    Banerjee, Ranjan
    Das, Anindya
    Mandal, Arindam
    Halder, Tapas
    [J]. EMERGING TECHNOLOGIES IN COMPUTER ENGINEERING: MICROSERVICES IN BIG DATA ANALYTICS, 2019, 985 : 113 - 127
  • [9] Comparative Study of K-means and Mini Batch K-means Clustering Algorithms in Android Malware Detection Using Network Traffic Analysis
    Feizollah, Ali
    Anuar, Nor Badrul
    Salleh, Rosli
    Amalina, Fairuz
    [J]. 2014 INTERNATIONAL SYMPOSIUM ON BIOMETRICS AND SECURITY TECHNOLOGIES (ISBAST), 2014, : 193 - 197
  • [10] A Survey on Various K-Means algorithms for Clustering
    Singh, Malwinder
    Bansal, Meenakshi
    [J]. INTERNATIONAL JOURNAL OF COMPUTER SCIENCE AND NETWORK SECURITY, 2015, 15 (06): : 60 - 65