Using a Set of Triangle Inequalities to Accelerate K-means Clustering

被引：2

作者：

Yu, Qiao ^{[1
]}

Chen, Kuan-Hsun ^{[1
]}

Chen, Jian-Jia ^{[1
]}

机构：

[1] TU Dortmund, Dept Comp Sci, Design Automat Embedded Syst Grp, Dortmund, Germany

来源：

SIMILARITY SEARCH AND APPLICATIONS, SISAP 2020 | 2020年 / 12440卷

关键词：

K-means; Clustering accelerating; Triangle inequalities;

D O I：

10.1007/978-3-030-60936-8_23

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

The k-means clustering is a well-known problem in data mining and machine learning. However, the de facto standard, i.e., Lloyd's k-mean algorithm, suffers from a large amount of time on the distance calculations. Elkan's k-means algorithm as one prominent approach exploits triangle inequality to greatly reduce such distance calculations between points and centers, while achieving the exactly same clustering results with significant speed improvement, especially on high-dimensional datasets. In this paper, we propose a set of triangle inequalities to enhance the filtering step of Elkan's k-means algorithm. With our new filtering bounds, a filtering-based Elkan (FB-Elkan) is proposed, which preserves the same results as Lloyd's k-means algorithm and additionally prunes unnecessary distance calculations. In addition, a memory-optimized Elkan (MO-Elkan) is provided, where the space complexity is greatly reduced by trading-off the maintenance of lower bounds and the run-time efficiency. Throughout evaluations with real-world datasets, FB-Elkan in general accelerates the original Elkan's k-means algorithm for high-dimensional datasets (up to 1.69x), whereas MO-Elkan outperforms the others for low-dimensional datasets (up to 2.48x). Specifically, when the datasets have a large number of points, i.e., n >= 5M, MO-Elkan still can derive the exact clustering results, while the original Elkan's k-means algorithm is not applicable due to memory limitation.

引用

页码：297 / 311

页数：15

共 50 条

[1] The Rough Set k-Means Clustering
Ubukata, Seiki
Notsu, Akira
Honda, Katsuhiro
[J]. 2016 JOINT 8TH INTERNATIONAL CONFERENCE ON SOFT COMPUTING AND INTELLIGENT SYSTEMS (SCIS) AND 17TH INTERNATIONAL SYMPOSIUM ON ADVANCED INTELLIGENT SYSTEMS (ISIS), 2016, : 189 - 193
[2] Clustering of Image Data Using K-Means and Fuzzy K-Means
Rahmani, Md. Khalid Imam
Pal, Naina
Arora, Kamiya
[J]. INTERNATIONAL JOURNAL OF ADVANCED COMPUTER SCIENCE AND APPLICATIONS, 2014, 5 (07) : 160 - 163
[3] Clones Clustering Using K-Means
Ashish, Aveg
[J]. PROCEEDINGS OF THE 10TH INTERNATIONAL CONFERENCE ON INTELLIGENT SYSTEMS AND CONTROL (ISCO'16), 2016,
[4] Clones clustering using K-means
Ashish, Aveg
[J]. Proceedings of the 10th International Conference on Intelligent Systems and Control, ISCO 2016, 2016,
[5] Soil data clustering by using K-means and fuzzy K-means algorithm
Hot, Elma
Popovic-Bugarin, Vesna
[J]. 2015 23RD TELECOMMUNICATIONS FORUM TELFOR (TELFOR), 2015, : 890 - 893
[6] Identification of concrete aggregates using K-means clustering and level set method
Chen, Lei
Shan, Wubin
Liu, Peng
[J]. STRUCTURES, 2021, 34 : 2069 - 2076
[7] Evaluation of Time Complexity Based on Triangle Height for K-Means Clustering
Lee, Shinwon
Lee, Wonhee
[J]. COMPUTER APPLICATIONS FOR DATABASE, EDUCATION, AND UBIQUITOUS COMPUTING, 2012, 352 : 177 - +
[8] Research on k-means Clustering Algorithm An Improved k-means Clustering Algorithm
Shi Na
Liu Xumin
Guan Yong
[J]. 2010 THIRD INTERNATIONAL SYMPOSIUM ON INTELLIGENT INFORMATION TECHNOLOGY AND SECURITY INFORMATICS (IITSI 2010), 2010, : 63 - 67
[9] Interval Set Clustering of Web Users with Rough K-Means
Pawan Lingras
Chad West
[J]. Journal of Intelligent Information Systems, 2004, 23 : 5 - 16
[10] Interval set clustering of web users with rough K-means
Lingras, P
West, C
[J]. JOURNAL OF INTELLIGENT INFORMATION SYSTEMS, 2004, 23 (01) : 5 - 16

← 1 2 3 4 5 →