Clustering with a new distance measure based on a dual-rooted tree

被引:27
|
作者
Galluccio, Laurent [1 ]
Michel, Olivier [2 ]
Comon, Pierre [2 ]
Kliger, Mark [3 ]
Hero, Alfred O. [4 ]
机构
[1] Observ Cote Azur, Lab Lagrange, F-06304 Nice 4, France
[2] Gipsa Lab UMR 5216, F-38402 St Martin Dheres, France
[3] Omek Interact Ltd, Har Tuv A, Bet Shemesh, Israel
[4] Univ Michigan, Dept Elect Engn & Comp Sci, Ann Arbor, MI 48109 USA
基金
美国国家科学基金会;
关键词
Non-metric clustering; Minimal spanning tree; Prim's algorithm; Affinity measure; Co-association measure; Consensus clustering; THEORETIC APPROACH; IMAGE; SIMILARITY; CONSENSUS;
D O I
10.1016/j.ins.2013.05.040
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
This paper introduces a novel distance measure for clustering high dimensional data based on the hitting time of two Minimal Spanning Trees (MST) grown sequentially from a pair of points by Prim's algorithm. When the proposed measure is used in conjunction with spectral clustering, we obtain a powerful clustering algorithm that is able to separate neighboring non-convex shaped clusters and to account for local as well as global geometric features of the data set. Remarkably, the new distance measure is a true metric even if the Prim algorithm uses a non-metric dissimilarity measure to compute the edges of the MST. This metric property brings added flexibility to the proposed method. In particular, the method is applied to clustering non Euclidean quantities, such as probability distributions or spectra, using the Kullback-Leibler divergence as a base measure. We reduce computational complexity by applying consensus clustering to a small ensemble of dual rooted MSTs. We show that the resultant consensus spectral clustering with dual rooted MST is competitive with other clustering methods, both in terms of clustering performance and computational complexity. We illustrate the proposed clustering algorithm on public domain benchmark data for which the ground truth is known, on one hand, and on real-world astrophysical data on the other hand. (c) 2013 Elsevier Inc. All rights reserved.
引用
收藏
页码:96 / 113
页数:18
相关论文
共 50 条
  • [1] CLUSTERING ON MANIFOLDS WITH DUAL-ROOTED MINIMAL SPANNING TREES
    Galluccio, L.
    Michel, O.
    Comon, P.
    [J]. 18TH EUROPEAN SIGNAL PROCESSING CONFERENCE (EUSIPCO-2010), 2010, : 1194 - 1198
  • [2] A DUAL-ROOTED MAXILLARY CENTRAL INCISOR
    SINAI, IH
    LUSTBADER, S
    [J]. JOURNAL OF ENDODONTICS, 1984, 10 (03) : 105 - 106
  • [3] A New Distance Measure for Hierarchical Clustering
    Yavuz, Hasan Serhan
    Cevikalp, Hakan
    [J]. 2008 IEEE 16TH SIGNAL PROCESSING, COMMUNICATION AND APPLICATIONS CONFERENCE, VOLS 1 AND 2, 2008, : 84 - 87
  • [4] A New Distance Measure for Model-Based Sequence Clustering
    Garcia-Garcia, Dario
    Parrado Hernandez, Emilio
    Diaz-de Maria, Fernando
    [J]. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2009, 31 (07) : 1325 - U183
  • [5] K-Modes clustering algorithm based on a new distance measure
    Liang, Jiye
    Bai, Liang
    Cao, Fuyuan
    [J]. Jisuanji Yanjiu yu Fazhan/Computer Research and Development, 2010, 47 (10): : 1749 - 1755
  • [6] A new nonparametric interpoint distance-based measure for assessment of clustering
    Modak, Soumita
    [J]. JOURNAL OF STATISTICAL COMPUTATION AND SIMULATION, 2022, 92 (05) : 1062 - 1077
  • [7] A genetic clustering technique using a new line symmetry based distance measure
    Saha, Sriparna
    Bandyopadhyay, Sanghamitra
    [J]. ADCOM 2007: PROCEEDINGS OF THE 15TH INTERNATIONAL CONFERENCE ON ADVANCED COMPUTING AND COMMUNICATIONS, 2007, : 365 - 370
  • [8] A new sequence distance measure for phylogenetic tree construction
    Otu, HH
    Sayood, K
    [J]. BIOINFORMATICS, 2003, 19 (16) : 2122 - 2130
  • [9] A dual distance based spatial clustering method
    Li, Guang-Qiang
    Deng, Min
    Cheng, Tao
    Zhu, Jian-Jun
    [J]. Cehui Xuebao/Acta Geodaetica et Cartographica Sinica, 2008, 37 (04): : 482 - 488
  • [10] Clustering Speech Samples based on Relative Distance Measure
    Kopparapu, Sunil Kumar
    Pandharipande, Meghna
    [J]. IMCIC 2010: INTERNATIONAL MULTI-CONFERENCE ON COMPLEXITY, INFORMATICS AND CYBERNETICS, VOL II, 2010, : 191 - 195