Fast, Linear Time Hierarchical Clustering using the Baire Metric

被引:12
|
作者
Contreras, Pedro [1 ,2 ]
Murtagh, Fionn [3 ,4 ]
机构
[1] Univ London, Dept Comp Sci, Egham TW20 0EX, Surrey, England
[2] ThinkingSafe Ltd, Egham, Surrey, England
[3] Univ London, Dublin, Ireland
[4] Sci Fdn Ireland, Dublin, Ireland
关键词
Hierarchical clustering; Ultrametric; Redshift; k-means; p-adic; m-adic; Baire; Longest common prefix; ULTRAMETRICITY; DENDROGRAMS; COMPUTATION;
D O I
10.1007/s00357-012-9106-3
中图分类号
O1 [数学];
学科分类号
0701 ; 070101 ;
摘要
The Baire metric induces an ultrametric on a dataset and is of linear computational complexity, contrasted with the standard quadratic time agglomerative hierarchical clustering algorithm. In this work we evaluate empirically this new approach to hierarchical clustering. We compare hierarchical clustering based on the Baire metric with (i) agglomerative hierarchical clustering, in terms of algorithm properties; (ii) generalized ultrametrics, in terms of definition; and (iii) fast clustering through k-means partitioning, in terms of quality of results. For the latter, we carry out an in depth astronomical study. We apply the Baire distance to spectrometric and photometric redshifts from the Sloan Digital Sky Survey using, in this work, about half a million astronomical objects. We want to know how well the (more costly to determine) spectrometric redshifts can predict the (more easily obtained) photometric redshifts, i.e. we seek to regress the spectrometric on the photometric redshifts, and we use clusterwise regression for this.
引用
收藏
页码:118 / 143
页数:26
相关论文
共 50 条
  • [41] Fast agglomerative hierarchical clustering algorithm using Locality-Sensitive Hashing
    Koga, Hisashi
    Ishibashi, Tetsuo
    Watanabe, Toshinori
    KNOWLEDGE AND INFORMATION SYSTEMS, 2007, 12 (01) : 25 - 53
  • [42] Scalable and Fast Hierarchical Clustering of IoT Malware Using Active Data Selection
    He, Tianxiang
    Han, Chansu
    Takahashi, Takeshi
    Kijima, Shuji
    Takeuchi, Jun'ichi
    2021 SIXTH INTERNATIONAL CONFERENCE ON FOG AND MOBILE EDGE COMPUTING (FMEC), 2021, : 120 - 125
  • [43] Fast agglomerative hierarchical clustering algorithm using Locality-Sensitive Hashing
    Hisashi Koga
    Tetsuo Ishibashi
    Toshinori Watanabe
    Knowledge and Information Systems, 2007, 12 : 25 - 53
  • [44] A PWA model identification method for nonlinear systems using hierarchical clustering based on the gap metric
    Wang, Jiaorao
    Song, Chunyue
    Zhao, Jun
    Xu, Zuhua
    COMPUTERS & CHEMICAL ENGINEERING, 2020, 138 (138)
  • [45] Privileged Information for Hierarchical Document Clustering: A Metric Learning Approach
    Marcacini, Ricardo M.
    Domingues, Marcos A.
    Hruschka, Eduardo R.
    Rezende, Solange O.
    2014 22ND INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION (ICPR), 2014, : 3636 - 3641
  • [46] Non-linear correlation analysis in financial markets using hierarchical clustering
    Salgado-Hernandez, J. E.
    Vyas, Manan
    JOURNAL OF PHYSICS COMMUNICATIONS, 2023, 7 (05):
  • [47] Scalable non-linear Support Vector Machine using hierarchical clustering
    Asharaf, S.
    Shevade, S. K.
    Murty, M. Narasimha
    18TH INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION, VOL 1, PROCEEDINGS, 2006, : 908 - +
  • [48] METRIC CHARACTERIZATION OF 1ST BAIRE CLASS LINEAR-FORMS AND OCTAHEDRAL NORMS
    GODEFROY, G
    STUDIA MATHEMATICA, 1989, 95 (01) : 1 - 15
  • [49] Evolutionary hierarchical time series clustering
    Chis, Monica
    Grosan, Crina
    ISDA 2006: SIXTH INTERNATIONAL CONFERENCE ON INTELLIGENT SYSTEMS DESIGN AND APPLICATIONS, VOL 1, 2006, : 451 - 455
  • [50] Hierarchical clustering using constraints
    Kant, Mariana
    LeBon, Maurice
    Sankoff, David
    BIOINFORMATICS RESEARCH AND APPLICATIONS, 2008, 4983 : 2 - +