Linear Storage and Potentially Constant Time Hierarchical Clustering Using the Baire Metric and Random Spanning Paths

被引:3
|
作者
Murtagh, Fionn [1 ,2 ,3 ]
Contreras, Pedro [4 ]
机构
[1] Goldsmiths Univ London, Dept Comp, London SE14 6NW, England
[2] De Montfort Univ, Leicester, Leics, England
[3] Univ Derby, Dept Comp & Math, Derby, England
[4] Thinking Safe Ltd, Egham TW20 0EX, Surrey, England
关键词
D O I
10.1007/978-3-319-25226-1_4
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
We study how random projections can be used with large data sets in order (1) to cluster the data using a fast, binning approach which is characterized in terms of direct inducing of a hierarchy through use of the Bairemetric; and (2) based on clusters found, selecting subsets of the original data for further analysis. In this work, we focus on random projection that is used for processing high dimensional data. A random projection, outputting a random permutation of the observation set, provides a random spanning path. We show how a spanning path relates to contiguity-or adjacency-constrained clustering. We study performance properties of hierarchical clustering constructed from random spanning paths, and we introduce a novel visualization of the results.
引用
收藏
页码:43 / 52
页数:10
相关论文
共 6 条