Subquadratic High-Dimensional Hierarchical Clustering

被引：0

作者：

Abboud, Amir ^{[1
]}

Cohen-Addad, Vincent ^{[2
,3
]}

Houdrouge, Hussein ^{[4
]}

机构：

[1] IBM Res, Yorktown Hts, NY 10598 USA

[2] CNRS, Paris, France

[3] Sorbonne Univ, Paris, France

[4] Ecole Polytech, Palaiseau, France

来源：

ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 32 (NIPS 2019) | 2019年 / 32卷

关键词：

D O I：

暂无

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

We consider the widely-used average-linkage, single-linkage, and Ward's methods for computing hierarchical clusterings of high-dimensional Euclidean inputs. It is easy to show that there is no efficient implementation of these algorithms in high dimensional Euclidean space since it implicitly requires to solve the closest pair problem, a notoriously difficult problem. However, how fast can these algorithms be implemented if we allow approximation? More precisely: these algorithms successively merge the clusters that are at closest average (for average-linkage), minimum distance (for single-linkage), or inducing the least sum-of-square error (for Ward's). We ask whether one could obtain a significant running-time improvement if the algorithm can merge a-approximate closest clusters (namely, clusters that are at distance (average, minimum, or sum-of-square error) at most gamma times the distance of the closest clusters). We show that one can indeed take advantage of the relaxation and compute the approximate hierarchical clustering tree using (O) over tilde (n) gamma-approximate nearest neighbor queries. This leads to an algorithm running in time (O) over tilde (nd) + n(1+O(1/gamma)) for d-dimensional Euclidean space. We then provide experiments showing that these algorithms perform as well as the non-approximate version for classic classification tasks while achieving a significant speed-up.

引用

页数：11

共 50 条

[1] Asymptotic properties of hierarchical clustering in high-dimensional settings
Egashira, Kento
Yata, Kazuyoshi
Aoshima, Makoto
[J]. JOURNAL OF MULTIVARIATE ANALYSIS, 2024, 199
[2] Sparse Bayesian hierarchical modeling of high-dimensional clustering problems
Lian, Heng
[J]. JOURNAL OF MULTIVARIATE ANALYSIS, 2010, 101 (07) : 1728 - 1737
[3] Hierarchical Clustering of High-Dimensional Data Without Global Dimensionality Reduction
Kampman, Ilari
Elomaa, Tapio
[J]. FOUNDATIONS OF INTELLIGENT SYSTEMS (ISMIS 2018), 2018, 11177 : 236 - 246
[4] High-dimensional data clustering
Bouveyron, C.
Girard, S.
Schmid, C.
[J]. COMPUTATIONAL STATISTICS & DATA ANALYSIS, 2007, 52 (01) : 502 - 519
[5] Clustering of high-dimensional observations
Wang, Yong
Modarres, Reza
[J]. JOURNAL OF NONPARAMETRIC STATISTICS, 2024,
[6] Clustering High-Dimensional Data
Masulli, Francesco
Rovetta, Stefano
[J]. CLUSTERING HIGH-DIMENSIONAL DATA, CHDD 2012, 2015, 7627 : 1 - 13
[7] GACH: a grid-based algorithm for hierarchical clustering of high-dimensional data
Mansoori, Eghbal G.
[J]. SOFT COMPUTING, 2014, 18 (05) : 905 - 922
[8] GACH: a grid-based algorithm for hierarchical clustering of high-dimensional data
Eghbal G. Mansoori
[J]. Soft Computing, 2014, 18 : 905 - 922
[9] Paralinear distance and its algorithm for hierarchical clustering of high-dimensional discrete variables
Wang, Shuai
Hao, Lizhu
Wang, Xiaofei
Guo, Jianhua
[J]. INTERNATIONAL JOURNAL OF APPROXIMATE REASONING, 2024, 167
[10] Subquadratic approximation algorithms for clustering problems in high dimensional spaces
Univ of Toronto, , Ont, Canada
[J]. Conf Proc Annu ACM Symp Theory Comput, (435-444):

← 1 2 3 4 5 →