Estimating the number of clusters in a numerical data set via quantization error modeling

被引:40
|
作者
Kolesnikov, Alexander [1 ]
Trichina, Elena [2 ]
Kauranne, Tuomo [3 ]
机构
[1] Arbonaut Ltd, Joertsuu, Finland
[2] Univ Eastern Finland, Joensuu, Finland
[3] Lappeenranta Univ Technol, Lappeenranta, Finland
关键词
Clustering; Number of clusters; Vector quantization; Color quantization; Dominant colors; Fractal dimensions; ALGORITHM;
D O I
10.1016/j.patcog.2014.09.017
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In this paper, we consider the problem of unsupervised clustering (vector quantization) of multidimensional numerical data. We propose a new method for determining an optimal number of clusters in the data set. The method is based on parametric modeling of the quantization error. The model parameter can be treated as the effective dimensionality of the data set. The proposed method was tested with artificial and real numerical data sets and the results of the experiments demonstrate empirically not only the effectiveness of the method but its ability to cope with difficult cases where other known methods fail. (C) 2014 Elsevier Ltd. All rights reserved.
引用
收藏
页码:941 / 952
页数:12
相关论文
共 50 条
  • [1] Estimating the number of clusters in a data set via the gap statistic
    Tibshirani, R
    Walther, G
    Hastie, T
    [J]. JOURNAL OF THE ROYAL STATISTICAL SOCIETY SERIES B-STATISTICAL METHODOLOGY, 2001, 63 : 411 - 423
  • [2] A hybrid method for estimating the predominant number of clusters in a data set
    Al Shaqsi, Jamil
    Wang, Wenjia
    [J]. 2012 11TH INTERNATIONAL CONFERENCE ON MACHINE LEARNING AND APPLICATIONS (ICMLA 2012), VOL 2, 2012, : 569 - 573
  • [3] A Multicriteria Decision Making Approach for Estimating the Number of Clusters in a Data Set
    Peng, Yi
    Zhang, Yong
    Kou, Gang
    Shi, Yong
    [J]. PLOS ONE, 2012, 7 (07):
  • [4] Estimating the number of clusters from distributional results of partitioning a given data set
    Möller, U
    [J]. Adaptive and Natural Computing Algorithms, 2005, : 151 - 154
  • [5] Estimating the Number of Clusters via the GUD Statistic
    Kou, Jiyao
    [J]. JOURNAL OF COMPUTATIONAL AND GRAPHICAL STATISTICS, 2014, 23 (02) : 403 - 417
  • [6] Estimating the number of clusters in DNA microarray data
    Bolshakova, N
    Azuaje, F
    [J]. METHODS OF INFORMATION IN MEDICINE, 2006, 45 (02) : 153 - 157
  • [7] Estimating the number of clusters in a ranking data context
    Calmon, Wilson
    Albi, Mariana
    [J]. INFORMATION SCIENCES, 2021, 546 : 977 - 995
  • [8] An ensemble method for estimating the number of clusters in a big data set using multiple random samples
    Mahmud, Mohammad Sultan
    Huang, Joshua Zhexue
    Ruby, Rukhsana
    Wu, Kaishun
    [J]. JOURNAL OF BIG DATA, 2023, 10 (01)
  • [9] An ensemble method for estimating the number of clusters in a big data set using multiple random samples
    Mohammad Sultan Mahmud
    Joshua Zhexue Huang
    Rukhsana Ruby
    Kaishun Wu
    [J]. Journal of Big Data, 10
  • [10] Estimating the Number of Clusters via System Evolution for Cluster Analysis of Gene Expression Data
    Wang, Kaijun
    Zheng, Jie
    Zhang, Junying
    Dong, Jiyang
    [J]. IEEE TRANSACTIONS ON INFORMATION TECHNOLOGY IN BIOMEDICINE, 2009, 13 (05): : 848 - 853