Using a Set of Triangle Inequalities to Accelerate K-means Clustering

被引:2
|
作者
Yu, Qiao [1 ]
Chen, Kuan-Hsun [1 ]
Chen, Jian-Jia [1 ]
机构
[1] TU Dortmund, Dept Comp Sci, Design Automat Embedded Syst Grp, Dortmund, Germany
关键词
K-means; Clustering accelerating; Triangle inequalities;
D O I
10.1007/978-3-030-60936-8_23
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The k-means clustering is a well-known problem in data mining and machine learning. However, the de facto standard, i.e., Lloyd's k-mean algorithm, suffers from a large amount of time on the distance calculations. Elkan's k-means algorithm as one prominent approach exploits triangle inequality to greatly reduce such distance calculations between points and centers, while achieving the exactly same clustering results with significant speed improvement, especially on high-dimensional datasets. In this paper, we propose a set of triangle inequalities to enhance the filtering step of Elkan's k-means algorithm. With our new filtering bounds, a filtering-based Elkan (FB-Elkan) is proposed, which preserves the same results as Lloyd's k-means algorithm and additionally prunes unnecessary distance calculations. In addition, a memory-optimized Elkan (MO-Elkan) is provided, where the space complexity is greatly reduced by trading-off the maintenance of lower bounds and the run-time efficiency. Throughout evaluations with real-world datasets, FB-Elkan in general accelerates the original Elkan's k-means algorithm for high-dimensional datasets (up to 1.69x), whereas MO-Elkan outperforms the others for low-dimensional datasets (up to 2.48x). Specifically, when the datasets have a large number of points, i.e., n >= 5M, MO-Elkan still can derive the exact clustering results, while the original Elkan's k-means algorithm is not applicable due to memory limitation.
引用
收藏
页码:297 / 311
页数:15
相关论文
共 50 条
  • [1] The Rough Set k-Means Clustering
    Ubukata, Seiki
    Notsu, Akira
    Honda, Katsuhiro
    [J]. 2016 JOINT 8TH INTERNATIONAL CONFERENCE ON SOFT COMPUTING AND INTELLIGENT SYSTEMS (SCIS) AND 17TH INTERNATIONAL SYMPOSIUM ON ADVANCED INTELLIGENT SYSTEMS (ISIS), 2016, : 189 - 193
  • [2] Clustering of Image Data Using K-Means and Fuzzy K-Means
    Rahmani, Md. Khalid Imam
    Pal, Naina
    Arora, Kamiya
    [J]. INTERNATIONAL JOURNAL OF ADVANCED COMPUTER SCIENCE AND APPLICATIONS, 2014, 5 (07) : 160 - 163
  • [3] Clones Clustering Using K-Means
    Ashish, Aveg
    [J]. PROCEEDINGS OF THE 10TH INTERNATIONAL CONFERENCE ON INTELLIGENT SYSTEMS AND CONTROL (ISCO'16), 2016,
  • [4] Clones clustering using K-means
    Ashish, Aveg
    [J]. Proceedings of the 10th International Conference on Intelligent Systems and Control, ISCO 2016, 2016,
  • [5] Soil data clustering by using K-means and fuzzy K-means algorithm
    Hot, Elma
    Popovic-Bugarin, Vesna
    [J]. 2015 23RD TELECOMMUNICATIONS FORUM TELFOR (TELFOR), 2015, : 890 - 893
  • [6] Identification of concrete aggregates using K-means clustering and level set method
    Chen, Lei
    Shan, Wubin
    Liu, Peng
    [J]. STRUCTURES, 2021, 34 : 2069 - 2076
  • [7] Evaluation of Time Complexity Based on Triangle Height for K-Means Clustering
    Lee, Shinwon
    Lee, Wonhee
    [J]. COMPUTER APPLICATIONS FOR DATABASE, EDUCATION, AND UBIQUITOUS COMPUTING, 2012, 352 : 177 - +
  • [8] Research on k-means Clustering Algorithm An Improved k-means Clustering Algorithm
    Shi Na
    Liu Xumin
    Guan Yong
    [J]. 2010 THIRD INTERNATIONAL SYMPOSIUM ON INTELLIGENT INFORMATION TECHNOLOGY AND SECURITY INFORMATICS (IITSI 2010), 2010, : 63 - 67
  • [9] Interval Set Clustering of Web Users with Rough K-Means
    Pawan Lingras
    Chad West
    [J]. Journal of Intelligent Information Systems, 2004, 23 : 5 - 16
  • [10] Interval set clustering of web users with rough K-means
    Lingras, P
    West, C
    [J]. JOURNAL OF INTELLIGENT INFORMATION SYSTEMS, 2004, 23 (01) : 5 - 16