A new cluster validity for data clustering

被引:17
|
作者
Yang, Xulei
Song, Qing
Cao, Aize
机构
[1] Nanyang Technol Univ, Sch Elect & Elect Engn, Comp Control Lab, Singapore 639798, Singapore
[2] Vanderbilt Univ, Sch Med, Nashville, TN 37232 USA
关键词
cluster validity; data clustering; deterministic annealing; structural risk minimization; Vapnik-Chervonenkis-bound;
D O I
10.1007/s11063-006-9005-x
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Cluster validity has been widely used to evaluate the fitness of partitions produced by clustering algorithms. This paper presents a new validity, which is called the Vapnik-Chervonenkis-bound (VB) index, for data clustering. It is estimated based on the structural risk minimization (SRM) principle, which optimizes the bound simultaneously over both the distortion function (empirical risk) and the VC-dimension (model complexity). The smallest bound of the guaranteed risk achieved on some appropriate cluster number validates the best description of the data structure. We use the deterministic annealing (DA) algorithm as the underlying clustering technique to produce the partitions. Five numerical examples and two real data sets are used to illustrate the use of VB as a validity index. Its effectiveness is compared to several popular cluster-validity indexes. The results of comparative study show that the proposed VB index has high ability in producing a good cluster number estimate and in addition, it provides a new approach for cluster validity from the view of statistical learning theory.
引用
收藏
页码:325 / 344
页数:20
相关论文
共 50 条
  • [1] A New Cluster Validity for Data Clustering
    Xulei Yang
    Aize Cao
    Qing Song
    [J]. Neural Processing Letters, 2006, 23 : 325 - 344
  • [2] A Data Clustering Tool with Cluster Validity Indices
    Qiao, Haiyan
    Edwards, Brandon
    [J]. PROCEEDINGS OF THE INTERNATIONAL CONFERENCE ON COMPUTING, ENGINEERING AND INFORMATION, 2009, : 303 - 309
  • [3] A new cluster-validity for fuzzy clustering
    Zahid, N
    Limouri, N
    Essaid, A
    [J]. PATTERN RECOGNITION, 1999, 32 (07) : 1089 - 1097
  • [4] A Study on Cluster Validity Measures for Clustering Network Data
    Hamasuna, Yukihiro
    Ozaki, Ryo
    Endo, Yasunori
    [J]. 2017 JOINT 17TH WORLD CONGRESS OF INTERNATIONAL FUZZY SYSTEMS ASSOCIATION AND 9TH INTERNATIONAL CONFERENCE ON SOFT COMPUTING AND INTELLIGENT SYSTEMS (IFSA-SCIS), 2017,
  • [5] Assessment of microarray data clustering results based on a new geometrical index for cluster validity
    Lam, Benson S. Y.
    Yan, Hong
    [J]. SOFT COMPUTING, 2007, 11 (04) : 341 - 348
  • [6] Assessment of Microarray Data Clustering Results Based on a New Geometrical Index for Cluster Validity
    Benson S. Y. Lam
    Hong Yan
    [J]. Soft Computing, 2007, 11 : 341 - 348
  • [7] A new clustering algorithm based on cluster validity indices
    Kim, M
    Ramakrishna, RS
    [J]. DISCOVERY SCIENCE, PROCEEDINGS, 2004, 3245 : 322 - 329
  • [8] Clustering Heterogeneous Web Data Using Clustering by Compression. Cluster Validity
    Cernian, Alexandra
    Carstoiu, Dorin
    Olteanu, Adriana
    [J]. PROCEEDINGS OF THE 10TH INTERNATIONAL SYMPOSIUM ON SYMBOLIC AND NUMERIC ALGORITHMS FOR SCIENTIFIC COMPUTING, 2009, : 123 - 126
  • [9] An integrated tool for microarray data clustering and cluster validity assessment
    Bolshakova, N
    Azuaje, F
    Cunningham, P
    [J]. BIOINFORMATICS, 2005, 21 (04) : 451 - 455
  • [10] Study of some clustering algorithms with a new cluster validity criterion
    Zribi, Ali
    Chtourou, Mohamed
    Djemel, Mohamed
    [J]. KUWAIT JOURNAL OF SCIENCE & ENGINEERING, 2012, 39 (1B): : 127 - 147