Model-based evaluation of clustering validation measures

被引:138
|
作者
Brun, Marcel
Sima, Chao
Hua, Jianping
Lowey, James
Carroll, Brent
Suh, Edward
Dougherty, Edward R.
机构
[1] Texas A&M Univ, Dept Elect Engn, College Stn, TX 77840 USA
[2] Translat Genom Res Inst, Phoenix, AZ USA
[3] Rice Univ, Dept Elect & Comp Engn, Houston, TX 77251 USA
[4] Univ Texas, MD Anderson Canc Ctr, Dept Pathol, Houston, TX 77030 USA
关键词
clustering algorithms; clustering errors; validation indices;
D O I
10.1016/j.patcog.2006.06.026
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
A cluster operator takes a set of data points and partitions the points into clusters (subsets). As with any scientific model, the scientific content of a cluster operator lies in its ability to predict results. This ability is measured by its error rate relative to cluster formation. To estimate the error of a cluster operator, a sample of point sets is generated, the algorithm is applied to each point set and the clusters evaluated relative to the known partition according to the distributions, and then the errors are averaged over the point sets composing the sample. Many validity measures have been proposed for evaluating clustering results based on a single realization of the random-point-set process. In this paper we consider a number of proposed validity measures and we examine how well they correlate with error rates across a number of clustering algorithms and random-point-set models. Validity measures fall broadly into three classes: internal validation is based on calculating properties of the resulting clusters; relative validation is based on comparisons of partitions generated by the same algorithm with different parameters or different subsets of the data; and external validation compares the partition generated by the clustering algorithm and a given partition of the data. To quantify the degree of similarity between the validation indices and the clustering errors, we use Kendall's rank correlation between their values. Our results indicate that, overall, the performance of validity indices is highly variable. For complex models or when a clustering algorithm yields complex clusters, both the internal and relative indices fail to predict the error of the algorithm. Some external indices appear to perform well, whereas others do not. We conclude that one should not put much faith in a validity score unless there is evidence, either in terms of sufficient data for model estimation or prior model knowledge, that a validity measure is well-correlated to the error rate of the clustering algorithm. (c) 2006 Pattern Recognition Society. Published by Elsevier Ltd. All rights reserved.
引用
收藏
页码:807 / 824
页数:18
相关论文
共 50 条
  • [41] Model-Based Clustering of Temporal Data
    El Assaad, Hani
    Same, Allou
    Govaert, Gerard
    Aknin, Patrice
    [J]. ARTIFICIAL NEURAL NETWORKS AND MACHINE LEARNING - ICANN 2013, 2013, 8131 : 9 - 16
  • [42] Probabilistic assessment of model-based clustering
    Zhu, Xuwen
    Melnykov, Volodymyr
    [J]. ADVANCES IN DATA ANALYSIS AND CLASSIFICATION, 2015, 9 (04) : 395 - 422
  • [43] Model-based clustering for longitudinal data
    De la Cruz-Mesia, Rolando
    Quintanab, Fernando A.
    Marshall, Guillermo
    [J]. COMPUTATIONAL STATISTICS & DATA ANALYSIS, 2008, 52 (03) : 1441 - 1457
  • [44] Model-based clustering with soft balancing
    Zhong, S
    Ghosh, J
    [J]. THIRD IEEE INTERNATIONAL CONFERENCE ON DATA MINING, PROCEEDINGS, 2003, : 459 - 466
  • [45] MODEL-BASED CLUSTERING OF LARGE NETWORKS
    Vu, Duy Q.
    Hunter, David R.
    Schweinberger, Michael
    [J]. ANNALS OF APPLIED STATISTICS, 2013, 7 (02): : 1010 - 1039
  • [46] Model-based Bayesian clustering (MBBC)
    Joo, Yongsung
    Booth, James G.
    Namkoong, Younghwan
    Casella, George
    [J]. BIOINFORMATICS, 2008, 24 (06) : 874 - 875
  • [47] Model-Based QoS Evaluation and Validation for Embedded Wireless Sensor Networks
    Jaeger, Sven
    Jungebloud, Tino
    Maschotta, Ralph
    Zimmermann, Armin
    [J]. IEEE SYSTEMS JOURNAL, 2016, 10 (02): : 592 - 603
  • [48] Towards a prior validation of a model-based approach for mobile usability evaluation
    Ben Ammar, Lassaad
    [J]. INTERNATIONAL JOURNAL OF ADVANCED AND APPLIED SCIENCES, 2020, 7 (11): : 87 - 96
  • [49] Finite mixture models and model-based clusteringFinite mixture models and model-based clustering
    Melnykov, Volodymyr
    Maitra, Ranjan
    [J]. STATISTICS SURVEYS, 2010, 4 : 80 - 116
  • [50] Model-based validation of a DOx sensor
    Clarke, DW
    Fraher, PMA
    [J]. CONTROL ENGINEERING PRACTICE, 1996, 4 (09) : 1313 - 1320