Statistically validated hierarchical clustering: Nested partitions in hierarchical trees

被引:4
|
作者
Bongiorno, Christian [1 ]
Micciche, Salvatore [2 ]
Mantegna, Rosario N. [2 ,3 ]
机构
[1] Univ Paris Saclay, Lab Math & Informat Syst Complexes, Cent Supelec, 3 Rue Joliot Curie, F-91192 Gif Sur Yvette, France
[2] Univ Palermo, Dipartimento Fis & Chim Emilio Segre, Viale Sci,Ed 18, I-90128 Palermo, Italy
[3] Complex Sci Hub Vienna, Josefstadter Str 39, A-1080 Vienna, Austria
关键词
Hierarchical trees; Clusters; Partitions; Multivariate series; GENE-EXPRESSION; R-PACKAGE; TESTS; MODEL; SET;
D O I
10.1016/j.physa.2022.126933
中图分类号
O4 [物理学];
学科分类号
0702 ;
摘要
We develop an algorithm that is fast and scalable in the detection of a nested partition extracted from a dendrogram that is obtained from hierarchical clustering of a multivariate series. Our algorithm provides a p-value for each clade observed in the hierarchical tree. The p-value is obtained by computing many bootstrap replicas of the dissimilarity matrix and by performing a statistical test on each difference between the dissimilarity associated with a given clade and the dissimilarity of the clade of its parent node. We prove the efficacy of our algorithm with a set of benchmarks generated by a hierarchically nested factor model. We compare results obtained by our algorithm with those of Pvclust. Pvclust is a widely-used algorithm pursuing a global approach originally developed in the context of phylogenetic studies. In our numerical experiments, we focus on the role of multiple hypothesis test correction and the robustness of the algorithms to inaccuracies and errors of datasets. We verify that our algorithm is much faster than Pvclust algorithm and has a better scalability both in the number of elements and in the number of records of the investigated multivariate set. We also apply our algorithm to two empirical datasets, one related to a biological complex system and the other related to financial time-series. We prove that the clusters detected by our methodology are meaningful with respect to some consensus partitioning of the two datasets. (c) 2022 Elsevier B.V. All rights reserved.
引用
收藏
页数:15
相关论文
共 50 条
  • [1] Shannon's entropy of partitions determined by hierarchical clustering trees in asymmetry and dimension identification
    Corredor, J. S.
    Quiroz, A. J.
    [J]. COMMUNICATIONS IN STATISTICS-SIMULATION AND COMPUTATION, 2022, 51 (10) : 5954 - 5966
  • [2] Hierarchical clustering in minimum spanning trees
    Yu, Meichen
    Hillebrand, Arjan
    Tewarie, Prejaas
    Meier, Jil
    van Dijk, Bob
    Van Mieghem, Piet
    Stam, Cornelis Jan
    [J]. CHAOS, 2015, 25 (02)
  • [3] Dependent nonparametric trees for dynamic hierarchical clustering
    Dubey, Avinava
    Ho, Qirong
    Williamson, Sinead
    Xing, Eric P.
    [J]. ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 27 (NIPS 2014), 2014, 27
  • [4] NESTED OPENMP PARALLELIZATION OF A HIERARCHICAL DATA CLUSTERING ALGORITHM
    Hadjidoukas, Panagiotis E.
    Amsaleg, Laurent
    [J]. PARALLEL PROCESSING LETTERS, 2010, 20 (02) : 187 - 208
  • [5] Growing Hierarchical Trees for Data Stream Clustering and Visualization
    Nhat-Quang Doan
    Ghesmoune, Mohammed
    Azzag, Hanane
    Lebbah, Mustapha
    [J]. 2015 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2015,
  • [6] Pitman Yor Diffusion Trees for Bayesian Hierarchical Clustering
    Knowles, David A.
    Ghahramani, Zoubin
    [J]. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2015, 37 (02) : 271 - 289
  • [7] Alternative Hierarchical Clustering Approach in Construction of Phylogenetic Trees
    Kandemir-Cavas, Cagin
    Nasibov, Efendi
    [J]. BIYOMUT: 2009 14TH NATIONAL BIOMEDICAL ENGINEERING MEETING, 2009, : 219 - 222
  • [8] Computational Tools for Evaluating Phylogenetic and Hierarchical Clustering Trees
    Chakerian, John
    Holmes, Susan
    [J]. JOURNAL OF COMPUTATIONAL AND GRAPHICAL STATISTICS, 2012, 21 (03) : 581 - 599
  • [9] Hierarchical clustering of text corpora using suffix trees
    Maslowska, I
    Slowinski, R
    [J]. INTELLIGENT INFORMATION PROCESSING AND WEB MINING, 2003, : 179 - 188
  • [10] Hierarchical Clustering via Sketches and Hierarchical Correlation Clustering
    Vainstein, Danny
    Chatziafratis, Vaggos
    Citovsky, Gui
    Rajagopalan, Anand
    Mahdian, Mohammad
    Azar, Yossi
    [J]. 24TH INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND STATISTICS (AISTATS), 2021, 130 : 559 - +