Fast, Linear Time Hierarchical Clustering using the Baire Metric

被引:12
|
作者
Contreras, Pedro [1 ,2 ]
Murtagh, Fionn [3 ,4 ]
机构
[1] Univ London, Dept Comp Sci, Egham TW20 0EX, Surrey, England
[2] ThinkingSafe Ltd, Egham, Surrey, England
[3] Univ London, Dublin, Ireland
[4] Sci Fdn Ireland, Dublin, Ireland
关键词
Hierarchical clustering; Ultrametric; Redshift; k-means; p-adic; m-adic; Baire; Longest common prefix; ULTRAMETRICITY; DENDROGRAMS; COMPUTATION;
D O I
10.1007/s00357-012-9106-3
中图分类号
O1 [数学];
学科分类号
0701 ; 070101 ;
摘要
The Baire metric induces an ultrametric on a dataset and is of linear computational complexity, contrasted with the standard quadratic time agglomerative hierarchical clustering algorithm. In this work we evaluate empirically this new approach to hierarchical clustering. We compare hierarchical clustering based on the Baire metric with (i) agglomerative hierarchical clustering, in terms of algorithm properties; (ii) generalized ultrametrics, in terms of definition; and (iii) fast clustering through k-means partitioning, in terms of quality of results. For the latter, we carry out an in depth astronomical study. We apply the Baire distance to spectrometric and photometric redshifts from the Sloan Digital Sky Survey using, in this work, about half a million astronomical objects. We want to know how well the (more costly to determine) spectrometric redshifts can predict the (more easily obtained) photometric redshifts, i.e. we seek to regress the spectrometric on the photometric redshifts, and we use clusterwise regression for this.
引用
收藏
页码:118 / 143
页数:26
相关论文
共 50 条
  • [21] Hierarchical Linear Dynamical Systems: A new model for clustering of time series
    Cinar, Goktug T.
    Loza, Carlos A.
    Principe, Jose C.
    PROCEEDINGS OF THE 2014 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2014, : 2464 - 2470
  • [22] genieclust: Fast and robust hierarchical clustering
    Gagolewski, Marek
    SOFTWAREX, 2021, 15
  • [23] Fast hierarchical clustering and its validation
    Dash, M
    Liu, H
    Scheuermann, P
    Tan, KL
    DATA & KNOWLEDGE ENGINEERING, 2003, 44 (01) : 109 - 138
  • [24] pPOP: Fast yet accurate parallel hierarchical clustering using partitioning
    Dash, Manoranjan
    Petrutiu, Simona
    Scheuermann, Peter
    DATA & KNOWLEDGE ENGINEERING, 2007, 61 (03) : 563 - 578
  • [25] An Approach for Fast Hierarchical Agglomerative Clustering Using Graphics Processors with CUDA
    Shalom, S. A. Arul
    Dash, Manoranjan
    Tue, Minh
    ADVANCES IN KNOWLEDGE DISCOVERY AND DATA MINING, PT II, PROCEEDINGS, 2010, 6119 : 35 - +
  • [26] Fast hierarchical clustering algorithm using locality-sensitive hashing
    Koga, H
    Ishibashi, T
    Watanabe, T
    DISCOVERY SCIENCE, PROCEEDINGS, 2004, 3245 : 114 - 128
  • [27] Hierarchical Clustering Given Confidence Intervals of Metric Distances
    Huang, Weiyu
    Ribeiro, Alejandro
    IEEE TRANSACTIONS ON SIGNAL PROCESSING, 2018, 66 (10) : 2600 - 2615
  • [28] AXIOMATIC HIERARCHICAL CLUSTERING GIVEN INTERVALS OF METRIC DISTANCES
    Huang, Weiyu
    Ribeiro, Alejandro
    2017 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2017, : 4227 - 4231
  • [29] A hierarchical multi-metric framework for item clustering
    Kotouza, Maria Th.
    Vavliakis, Konstantinos N.
    Psomopoulos, Fotis E.
    Mitkas, Pericles A.
    2018 IEEE/ACM 5TH INTERNATIONAL CONFERENCE ON BIG DATA COMPUTING APPLICATIONS AND TECHNOLOGIES (BDCAT), 2018, : 191 - 197
  • [30] Fast, Accurate Spectral Clustering Using Locally Linear Landmarks
    Vladymyrov, Max
    Carreira-Perpinan, Miguel A.
    2017 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2017, : 3870 - 3879