A non-parametric method to estimate the number of clusters

被引:47
|
作者
Fujita, Andre [1 ]
Takahashi, Daniel Y. [2 ,3 ]
Patriota, Alexandre G. [4 ]
机构
[1] Univ Sao Paulo, Inst Math & Stat, Dept Comp Sci, BR-05508 Sao Paulo, Brazil
[2] Princeton Univ, Dept Psychol, Princeton, NJ 08544 USA
[3] Princeton Univ, Inst Neurosci, Princeton, NJ 08544 USA
[4] Univ Sao Paulo, Inst Math & Stat, Dept Stat, BR-05508 Sao Paulo, Brazil
基金
巴西圣保罗研究基金会;
关键词
Clustering; Silhouette method; k-means; Spectral clustering; VALIDATION; ALGORITHM;
D O I
10.1016/j.csda.2013.11.012
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
An important and yet unsolved problem in unsupervised data clustering is how to determine the number of clusters. The proposed slope statistic is a non-parametric and data driven approach for estimating the number of clusters in a dataset. This technique uses the output of any clustering algorithm and identifies the maximum number of groups that breaks down the structure of the dataset. Intensive Monte Carlo simulation studies show that the slope statistic outperforms (for the considered examples) some popular methods that have been proposed in the literature. Applications in graph clustering, in iris and breast cancer datasets are shown. (C) 2013 Elsevier B.V. All rights reserved.
引用
收藏
页码:27 / 39
页数:13
相关论文
共 50 条
  • [1] A Bayesian non-parametric method to detect clusters in Planck data
    Diego, JM
    Vielva, P
    Martínez-González, E
    Silk, J
    Sanz, JL
    [J]. MONTHLY NOTICES OF THE ROYAL ASTRONOMICAL SOCIETY, 2002, 336 (04) : 1351 - 1363
  • [2] Non-parametric inference on the number of equilibria
    Kasy, Maximilian
    [J]. ECONOMETRICS JOURNAL, 2015, 18 (01): : 1 - 39
  • [3] A BAYESIAN NON-PARAMETRIC ESTIMATE FOR MULTIVARIATE REGRESSION
    POLI, I
    [J]. JOURNAL OF ECONOMETRICS, 1985, 28 (02) : 171 - 182
  • [4] A note on ROC analysis and non-parametric estimate of sensitivity
    Zhang, J
    Mueller, ST
    [J]. PSYCHOMETRIKA, 2005, 70 (01) : 203 - 212
  • [5] A note on ROC analysis and non-parametric estimate of sensitivity
    Jun Zhang
    Shane T. Mueller
    [J]. Psychometrika, 2005, 70 : 203 - 212
  • [6] NON-PARAMETRIC ESTIMATE OF A PROBABILITY DENSITY-FUNCTION
    KONAKOV, VD
    [J]. TEORIYA VEROYATNOSTEI I YEYE PRIMENIYA, 1972, 17 (02): : 377 - &
  • [7] A non-parametric estimate of mass 'scoured' in galaxy cores
    Hopkins, Philip F.
    Hernquist, Lars
    [J]. MONTHLY NOTICES OF THE ROYAL ASTRONOMICAL SOCIETY, 2010, 407 (01) : 447 - 457
  • [8] To be parametric or non-parametric, that is the question Parametric and non-parametric statistical tests
    Van Buren, Eric
    Herring, Amy H.
    [J]. BJOG-AN INTERNATIONAL JOURNAL OF OBSTETRICS AND GYNAECOLOGY, 2020, 127 (05) : 549 - 550
  • [9] Non-parametric method of technological progress
    Guo, JF
    Yang, DL
    [J]. PROCEEDINGS OF '97 INTERNATIONAL CONFERENCE ON MANAGEMENT SCIENCE & ENGINEERING, 1997, : 649 - 653
  • [10] Comparison of reliability techniques of parametric and non-parametric method
    Kalaiselvan, C.
    Rao, L. Bhaskara
    [J]. ENGINEERING SCIENCE AND TECHNOLOGY-AN INTERNATIONAL JOURNAL-JESTECH, 2016, 19 (02): : 691 - 699