A hybrid method for estimating the predominant number of clusters in a data set

被引:0
|
作者
Al Shaqsi, Jamil [1 ]
Wang, Wenjia [2 ]
机构
[1] Sultan Qaboos Univ, Dept Informat Syst, Muscat, Oman
[2] Univ East Anglia, Sch Comp Sci, Norwich, Norfolk, England
关键词
cluster analysis; cluster number; similarity measure;
D O I
10.1109/ICMLA.2012.146
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In cluster analysis, finding out the number of clusters, K, for a given dataset is an important yet very tricky task, simply because there is often no universally accepted correct or wrong answer for non-trivial real world problems and it also depends on the context and purpose of a cluster study. This paper presents a new hybrid method for estimating the predominant number of clusters automatically. It employs a new similarity measure and then calculates the length of constant similarity intervals, L and considers the longest consistent intervals representing the most probable numbers of the clusters under the set context. An error function is defined to measure and evaluate the goodness of estimations. The proposed method has been tested on 3 synthetic datasets and 8 real-world benchmark datasets, and compared with some other popular methods. The experimental results showed that the proposed method is able to determine the desired number of clusters for all the simulated datasets and most of the benchmark datasets, and the statistical tests indicate that our method is significantly better.
引用
收藏
页码:569 / 573
页数:5
相关论文
共 50 条
  • [1] Estimating the predominant number of clusters in a dataset
    Al Shaqsi, Jamil
    Wang, Wenjia
    [J]. INTELLIGENT DATA ANALYSIS, 2013, 17 (04) : 603 - 626
  • [2] Estimating the number of clusters in a data set via the gap statistic
    Tibshirani, R
    Walther, G
    Hastie, T
    [J]. JOURNAL OF THE ROYAL STATISTICAL SOCIETY SERIES B-STATISTICAL METHODOLOGY, 2001, 63 : 411 - 423
  • [3] An ensemble method for estimating the number of clusters in a big data set using multiple random samples
    Mahmud, Mohammad Sultan
    Huang, Joshua Zhexue
    Ruby, Rukhsana
    Wu, Kaishun
    [J]. JOURNAL OF BIG DATA, 2023, 10 (01)
  • [4] An ensemble method for estimating the number of clusters in a big data set using multiple random samples
    Mohammad Sultan Mahmud
    Joshua Zhexue Huang
    Rukhsana Ruby
    Kaishun Wu
    [J]. Journal of Big Data, 10
  • [5] A Multicriteria Decision Making Approach for Estimating the Number of Clusters in a Data Set
    Peng, Yi
    Zhang, Yong
    Kou, Gang
    Shi, Yong
    [J]. PLOS ONE, 2012, 7 (07):
  • [6] Estimating the number of clusters in a numerical data set via quantization error modeling
    Kolesnikov, Alexander
    Trichina, Elena
    Kauranne, Tuomo
    [J]. PATTERN RECOGNITION, 2015, 48 (03) : 941 - 952
  • [7] Estimating the number of clusters from distributional results of partitioning a given data set
    Möller, U
    [J]. Adaptive and Natural Computing Algorithms, 2005, : 151 - 154
  • [8] Estimating the number of clusters in a ranking data context
    Calmon, Wilson
    Albi, Mariana
    [J]. INFORMATION SCIENCES, 2021, 546 : 977 - 995
  • [9] Estimating the number of clusters in DNA microarray data
    Bolshakova, N
    Azuaje, F
    [J]. METHODS OF INFORMATION IN MEDICINE, 2006, 45 (02) : 153 - 157
  • [10] A Method to Find Optimum Number of Clusters Based on Fuzzy Silhouette on Dynamic Data Set
    Subbalakshmi, Chatti
    Krishna, G. Rama
    Rao, S. Krishna Mohan
    Rao, P. Venketeswa
    [J]. PROCEEDINGS OF THE INTERNATIONAL CONFERENCE ON INFORMATION AND COMMUNICATION TECHNOLOGIES, ICICT 2014, 2015, 46 : 346 - 353