Finding multiple stable clusterings

被引:0
|
作者
Juhua Hu
Qi Qian
Jian Pei
Rong Jin
Shenghuo Zhu
机构
[1] Simon Fraser University,School of Computing Science
[2] Alibaba Group,undefined
来源
关键词
Multi-clustering; Clustering stability; Laplacian eigengap; Feature subspace;
D O I
暂无
中图分类号
学科分类号
摘要
Multi-clustering, which tries to find multiple independent ways to partition a data set into groups, has enjoyed many applications, such as customer relationship management, bioinformatics and healthcare informatics. This paper addresses two fundamental questions in multi-clustering: How to model quality of clusterings and how to find multiple stable clusterings (MSC). We introduce to multi-clustering the notion of clustering stability based on Laplacian eigengap, which was originally used by the regularized spectral learning method for similarity matrix learning. We mathematically prove that the larger the eigengap, the more stable the clustering. Furthermore, we propose a novel multi-clustering method MSC. An advantage of our method comparing to the state-of-the-art multi-clustering methods is that our method can provide users a feature subspace to understand each clustering solution. Another advantage is that MSC does not need users to specify the number of clusters and the number of alternative clusterings, which is usually difficult for users without any guidance. Our method can heuristically estimate the number of stable clusterings in a data set. We also discuss a practical way to make MSC applicable to large-scale data. We report an extensive empirical study that clearly demonstrates the effectiveness of our method.
引用
收藏
页码:991 / 1021
页数:30
相关论文
共 50 条
  • [31] Matching and visualization of multiple overlapping clusterings of microarray data
    Krumpelman, Chase
    Ghosh, Joydeep
    2007 IEEE SYMPOSIUM ON COMPUTATIONAL INTELLIGENCE IN BIOINFORMATICS AND COMPUTATIONAL BIOLOGY, 2007, : 121 - +
  • [32] Consensus Methods for Combining Multiple Clusterings of Chemical Structures
    Saeed, Faisal
    Salim, Naomie
    Abdo, Ammar
    JOURNAL OF CHEMICAL INFORMATION AND MODELING, 2013, 53 (05) : 1026 - 1034
  • [33] Discovering Multiple Co-Clusterings With Matrix Factorization
    Wang, Jun
    Wang, Xing
    Yu, Guoxian
    Domeniconi, Carlotta
    Yu, Zhiwen
    Zhang, Zili
    IEEE TRANSACTIONS ON CYBERNETICS, 2021, 51 (07) : 3576 - 3587
  • [34] Deep Incomplete Multi-View Multiple Clusterings
    Wei, Shaowei
    Wang, Jun
    Yu, Guoxian
    Domeniconi, Carlotta
    Zhang, Xiangliang
    20TH IEEE INTERNATIONAL CONFERENCE ON DATA MINING (ICDM 2020), 2020, : 651 - 660
  • [35] EpiMC: Detecting Epistatic Interactions Using Multiple Clusterings
    Wang, Jun
    Zhang, Huiling
    Ren, Wei
    Guo, Maozu
    Yu, Guoxian
    IEEE-ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS, 2022, 19 (01) : 243 - 254
  • [36] Semi-supervised classification using multiple clusterings
    Yu G.X.
    Feng L.
    Yao G.J.
    Wang J.
    Wang, J. (kingjun@swu.edu.cn), 1600, Izdatel'stvo Nauka (26): : 681 - 687
  • [37] Clustering trees: a visualization for evaluating clusterings at multiple resolutions
    Zappia, Luke
    Oshlack, Alicia
    GIGASCIENCE, 2018, 7 (07):
  • [38] Combining multiple clusterings using fast simulated annealing
    Lu, Zhiwu
    Peng, Yuxin
    Ip, Horace H. S.
    PATTERN RECOGNITION LETTERS, 2011, 32 (15) : 1956 - 1961
  • [39] Finding Biologically Accurate Clusterings in Hierarchical Tree Decompositions Using the Variation of Information
    Navlakha, Saket
    White, James
    Nagarajan, Niranjan
    Pop, Mihai
    Kingsford, Carl
    RESEARCH IN COMPUTATIONAL MOLECULAR BIOLOGY, PROCEEDINGS, 2009, 5541 : 400 - +
  • [40] Finding Biologically Accurate Clusterings in Hierarchical Tree Decompositions Using the Variation of Information
    Navlakha, Saket
    White, James
    Nagarajan, Niranjan
    Pop, Mihai
    Kingsford, Carl
    JOURNAL OF COMPUTATIONAL BIOLOGY, 2010, 17 (03) : 503 - 516