Automatically Determining the Number of Clusters in Unlabeled Data Sets

被引:62
|
作者
Wang, Liang [1 ]
Leckie, Christopher [1 ]
Ramamohanarao, Kotagiri [1 ]
Bezdek, James [1 ]
机构
[1] Univ Melbourne, Dept Comp Sci & Software Engn, Parkville, Vic 3010, Australia
基金
澳大利亚研究理事会;
关键词
Clustering; cluster tendency; reordered dissimilarity image; VAT; VISUAL ASSESSMENT; TENDENCY; SELECTION; AID;
D O I
10.1109/TKDE.2008.158
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
One of the major problems in cluster analysis is the determination of the number of clusters in unlabeled data, which is a basic input for most clustering algorithms. In this paper, we investigate a new method called Dark Block Extraction (DBE) for automatically estimating the number of clusters in unlabeled data sets, which is based on an existing algorithm for Visual Assessment of Cluster Tendency (VAT) of a data set, using several common image and signal processing techniques. Its basic steps include 1) generating a VAT image of an input dissimilarity matrix, 2) performing image segmentation on the VAT image to obtain a binary image, followed by directional morphological filtering, 3) applying a distance transform to the filtered binary image and projecting the pixel values onto the main diagonal axis of the image to form a projection signal, and 4) smoothing the projection signal, computing its first-order derivative, and then detecting major peaks and valleys in the resulting signal to decide the number of clusters. Our DBE method is nearly "automatic," depending on just one easy-to-set parameter. Several numerical and real-world examples are presented to illustrate the effectiveness of DBE.
引用
收藏
页码:335 / 350
页数:16
相关论文
共 50 条
  • [1] Enhanced Dark Block Extraction Method Performed Automatically to Determine the Number of Clusters in Unlabeled Data Sets
    Prabhu, P.
    Duraiswamy, K.
    INTERNATIONAL JOURNAL OF COMPUTERS COMMUNICATIONS & CONTROL, 2013, 8 (02) : 275 - 293
  • [2] A Method for Automatically Determining The Number of Clusters of LAC
    Liu, Han
    Wu, Qingfeng
    Dong, Huailin
    Wang, Shuangshuang
    Cai, Qing
    Ma, Zhuo
    ICCSSE 2009: PROCEEDINGS OF 2009 4TH INTERNATIONAL CONFERENCE ON COMPUTER SCIENCE & EDUCATION, 2009, : 1907 - +
  • [3] An examination of indexes for determining the number of clusters in binary data sets
    Evgenia Dimitriadou
    Sara Dolničar
    Andreas Weingessel
    Psychometrika, 2002, 67 : 137 - 159
  • [4] An examination of indexes for determining the number of clusters in binary data sets
    Dimitriadou, E
    Dolnicar, S
    Weingessel, A
    PSYCHOMETRIKA, 2002, 67 (01) : 137 - 159
  • [5] Fuzzy clustering algorithm for automatically determining the number of clusters
    Hu Yangyang
    Liu Zengli
    CONFERENCE PROCEEDINGS OF 2019 IEEE INTERNATIONAL CONFERENCE ON SIGNAL PROCESSING, COMMUNICATIONS AND COMPUTING (IEEE ICSPCC 2019), 2019,
  • [6] A Spectral Clustering Algorithm for Automatically Determining Clusters Number
    Chen, Bin
    Wang, Ya-lin
    Gong, Fan-ying
    Wang, Xiao-li
    Yang, Chun-hua
    2014 11TH WORLD CONGRESS ON INTELLIGENT CONTROL AND AUTOMATION (WCICA), 2014, : 3723 - 3728
  • [7] EVALUATION OF COEFFICIENTS FOR DETERMINING THE OPTIMAL NUMBER OF CLUSTERS IN CLUSTER ANALYSIS ON REAL DATA SETS
    Loster, Tomas
    9TH INTERNATIONAL DAYS OF STATISTICS AND ECONOMICS, 2015, : 1014 - 1023
  • [8] A Clustering Algorithm for Automatically Determining the Number of Clusters Based on Coefficient of Variation
    Liu, Tengteng
    Qu, Shouning
    Zhang, Kun
    PROCEEDINGS OF THE 2018 2ND INTERNATIONAL CONFERENCE ON BIG DATA RESEARCH (ICBDR 2018), 2018, : 100 - 106
  • [9] Dynamic estimation of number of clusters in data sets
    Boudraa, AO
    ELECTRONICS LETTERS, 1999, 35 (19) : 1606 - 1608
  • [10] Fuzzy C-means clustering algorithm for automatically determining the number of clusters
    Wang, Zhihe
    Wang, Shuyan
    Du, Hui
    Guo, Hao
    2020 16TH INTERNATIONAL CONFERENCE ON COMPUTATIONAL INTELLIGENCE AND SECURITY (CIS 2020), 2020, : 223 - 227