Statistically validated hierarchical clustering: Nested partitions in hierarchical trees

被引:4
|
作者
Bongiorno, Christian [1 ]
Micciche, Salvatore [2 ]
Mantegna, Rosario N. [2 ,3 ]
机构
[1] Univ Paris Saclay, Lab Math & Informat Syst Complexes, Cent Supelec, 3 Rue Joliot Curie, F-91192 Gif Sur Yvette, France
[2] Univ Palermo, Dipartimento Fis & Chim Emilio Segre, Viale Sci,Ed 18, I-90128 Palermo, Italy
[3] Complex Sci Hub Vienna, Josefstadter Str 39, A-1080 Vienna, Austria
关键词
Hierarchical trees; Clusters; Partitions; Multivariate series; GENE-EXPRESSION; R-PACKAGE; TESTS; MODEL; SET;
D O I
10.1016/j.physa.2022.126933
中图分类号
O4 [物理学];
学科分类号
0702 ;
摘要
We develop an algorithm that is fast and scalable in the detection of a nested partition extracted from a dendrogram that is obtained from hierarchical clustering of a multivariate series. Our algorithm provides a p-value for each clade observed in the hierarchical tree. The p-value is obtained by computing many bootstrap replicas of the dissimilarity matrix and by performing a statistical test on each difference between the dissimilarity associated with a given clade and the dissimilarity of the clade of its parent node. We prove the efficacy of our algorithm with a set of benchmarks generated by a hierarchically nested factor model. We compare results obtained by our algorithm with those of Pvclust. Pvclust is a widely-used algorithm pursuing a global approach originally developed in the context of phylogenetic studies. In our numerical experiments, we focus on the role of multiple hypothesis test correction and the robustness of the algorithms to inaccuracies and errors of datasets. We verify that our algorithm is much faster than Pvclust algorithm and has a better scalability both in the number of elements and in the number of records of the investigated multivariate set. We also apply our algorithm to two empirical datasets, one related to a biological complex system and the other related to financial time-series. We prove that the clusters detected by our methodology are meaningful with respect to some consensus partitioning of the two datasets. (c) 2022 Elsevier B.V. All rights reserved.
引用
收藏
页数:15
相关论文
共 50 条
  • [21] Incremental Clustering for Hierarchical Clustering
    Narita, Kakeru
    Hochin, Teruhisa
    Nomiya, Hiroki
    [J]. 2018 5TH INTERNATIONAL CONFERENCE ON COMPUTATIONAL SCIENCE/ INTELLIGENCE AND APPLIED INFORMATICS (CSII 2018), 2018, : 102 - 107
  • [22] Hierarchical classification of diatom images using ensembles of predictive clustering trees
    Dimitrovski, Ivica
    Kocev, Dragi
    Loskovska, Suzana
    Dzeroski, Saso
    [J]. ECOLOGICAL INFORMATICS, 2012, 7 (01) : 19 - 29
  • [23] Nested Hierarchical Dirichlet Processes
    Paisley, John
    Wang, Chong
    Blei, David M.
    Jordan, Michael I.
    [J]. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2015, 37 (02) : 256 - 270
  • [24] Hierarchical multi-classification with predictive clustering trees in functional genomics
    Struyf, J
    Dzeroski, S
    Blockeel, H
    Clare, A
    [J]. PROGRESS IN ARTIFICIAL INTELLIGENCE, PROCEEDINGS, 2005, 3808 : 272 - 283
  • [25] dendextend: an R package for visualizing, adjusting and comparing trees of hierarchical clustering
    Galili, Tal
    [J]. BIOINFORMATICS, 2015, 31 (22) : 3718 - 3720
  • [26] Fully-Dynamic Hierarchical Graph Clustering Using Cut Trees
    Doll, Christof
    Hartmann, Tanja
    Wagner, Dorothea
    [J]. ALGORITHMS AND DATA STRUCTURES, 2011, 6844 : 338 - +
  • [27] Seeing the forest for the trees: using the Gene Ontology to restructure hierarchical clustering
    Dotan-Cohen, Dikla
    Kasif, Simon
    Melkman, Avraham A.
    [J]. BIOINFORMATICS, 2009, 25 (14) : 1789 - 1795
  • [28] Option Predictive Clustering Trees for Hierarchical Multi-label Classification
    Perdih, Tomaz Stepisnik
    Osojnik, Aljaz
    Dzeroski, Sao
    Kocev, Dragi
    [J]. DISCOVERY SCIENCE, DS 2017, 2017, 10558 : 116 - 123
  • [29] Hierarchical spherical clustering
    Torra, V
    Miyamoto, S
    [J]. INTERNATIONAL JOURNAL OF UNCERTAINTY FUZZINESS AND KNOWLEDGE-BASED SYSTEMS, 2002, 10 (02) : 157 - 172
  • [30] On validation of hierarchical clustering
    Mucha, Hans-Joachim
    [J]. ADVANCES IN DATA ANALYSIS, 2007, : 115 - 122