RaCoCl: Robust Rank Correlation Based Clustering - An Exploratory Study for High-Dimensional Data

被引:0
|
作者
Krone, Martin [1 ]
Klawonn, Frank [1 ]
Jayaram, Balasubramaniam [2 ]
机构
[1] Ostfalia Univ Appl Sci, Wolfenbuettel, Germany
[2] Indian Inst Technol, Hyderabad, Andhra Pradesh, India
基金
欧洲研究理事会;
关键词
Fuzzy Gamma Rank Correlation Coefficient; Clustering; High-dimensional Data; Fuzzy C-Means; ASSOCIATION;
D O I
10.1109/FUZZ-IEEE.2013.6622463
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The curse of dimensionality, which refers to both the combinatorial explosion in dimensions and the concentration of distances or norms in high dimensions, affects most of the clustering techniques. Recent studies on the concentration of norms suggest the use of a correlation measure instead of distances to more effectively judge (dis)similarity in high dimensions. In this work, based on these observations, we propose a robust rank correlation based clustering method. Specifically, we employ the recently proposed fuzzy gamma rank correlation measure. We show that this intuitively simple algorithm has the following advantages: (i) It requires very few parameters to be set, (ii) the number of clusters need not be apriori known, (iii) while there is an indirect dependence on the underlying distance measure, its makes use of both global and local information, (iv) it can be robust to noise depending on the correlation measure employed and, (v) as it is shown, performs well with high dimensional data. We illustrate the algorithm on some datasets where the traditional Fuzzy C-Means algorithm is known to fail.
引用
收藏
页数:8
相关论文
共 50 条
  • [21] Scalable clustering for large high-dimensional data based on data summarization
    Lai, Ying
    Orlandic, Ratko
    Yee, Wai Gen
    Kulkarni, Sachin
    [J]. 2007 IEEE SYMPOSIUM ON COMPUTATIONAL INTELLIGENCE AND DATA MINING, VOLS 1 AND 2, 2007, : 456 - 461
  • [22] Robust clustering of noisy high-dimensional gene expression data for patients subtyping
    Coretto, Pietro
    Serra, Angela
    Tagliaferri, Roberto
    [J]. BIOINFORMATICS, 2018, 34 (23) : 4064 - 4072
  • [23] On rank distribution classifiers for high-dimensional data
    Samuel Makinde, Olusola
    [J]. JOURNAL OF APPLIED STATISTICS, 2020, 47 (13-15) : 2895 - 2911
  • [24] Robust PCA for high-dimensional data
    Hubert, M
    Rousseeuw, PJ
    Verboven, S
    [J]. DEVELOPMENTS IN ROBUST STATISTICS, 2003, : 169 - 179
  • [25] Model-based clustering of high-dimensional data: A review
    Bouveyron, Charles
    Brunet-Saumard, Camille
    [J]. COMPUTATIONAL STATISTICS & DATA ANALYSIS, 2014, 71 : 52 - 78
  • [26] Persistent homology based clustering algorithm for high-dimensional data
    Xiong, Zhengda
    Wei, Yizhuo
    Xiong, Ziheng
    He, Kun
    [J]. Huazhong Keji Daxue Xuebao (Ziran Kexue Ban)/Journal of Huazhong University of Science and Technology (Natural Science Edition), 2024, 52 (02): : 29 - 35
  • [27] MODEL-BASED CLUSTERING OF HIGH-DIMENSIONAL DATA IN ASTROPHYSICS
    Bouveyron, C.
    [J]. STATISTICS FOR ASTROPHYSICS: CLUSTERING AND CLASSIFICATION, 2016, 77 : 91 - 119
  • [28] A rank-based adaptive independence test for high-dimensional data
    Shi, Xiangyu
    Cao, Ruiyuan
    Du, Jiang
    Miao, Zhuqing
    [J]. COMMUNICATIONS IN STATISTICS-SIMULATION AND COMPUTATION, 2024,
  • [29] Robust PCA for high-dimensional data based on characteristic transformation
    He, Lingyu
    Yang, Yanrong
    Zhang, Bo
    [J]. AUSTRALIAN & NEW ZEALAND JOURNAL OF STATISTICS, 2023, 65 (02) : 127 - 151
  • [30] Robust and sparse correlation matrix estimation for the analysis of high-dimensional genomics data
    Serra, Angela
    Coretto, Pietro
    Fratello, Michele
    Tagliaferri, Roberto
    [J]. BIOINFORMATICS, 2018, 34 (04) : 625 - 634