Nbclust: An R Package for Determining the Relevant Number of Clusters in a Data Set

被引:0
|
作者
Charrad, Malika [1 ,2 ]
Ghazzali, Nadia [3 ]
Boiteau, Veronique [4 ]
Niknafs, Azam [4 ]
机构
[1] Univ Gabes, Inst Super Informat, Medenine 4100, Tunisia
[2] Univ Laval, Quebec City, PQ, Canada
[3] Univ Quebec Trois Rivieres, Quebec City, PQ, Canada
[4] Univ Laval, Dept Math & Stat, Quebec City, PQ G1K 7P4, Canada
来源
JOURNAL OF STATISTICAL SOFTWARE | 2014年 / 61卷 / 06期
基金
加拿大自然科学与工程研究理事会;
关键词
R package; cluster validity; number of clusters; clustering; indices; k-means; hierarchical clustering; CRITERION; VALIDATION; INDEXES;
D O I
暂无
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
Clustering is the partitioning of a set of objects into groups (clusters) so that objects within a group are more similar to each others than objects in different groups. Most of the clustering algorithms depend on some assumptions in order to define the subgroups present in a data set. As a consequence, the resulting clustering scheme requires some sort of evaluation as regards its validity. The evaluation procedure has to tackle difficult problems such as the quality of clusters, the degree with which a clustering scheme fits a specific data set and the optimal number of clusters in a partitioning. In the literature, a wide variety of indices have been proposed to find the optimal number of clusters in a partitioning of a data set during the clustering process. However, for most of indices proposed in the literature, programs are unavailable to test these indices and compare them. The R package NbClust has been developed for that purpose. It provides 30 indices which determine the number of clusters in a data set and it offers also the best clustering scheme from different results to the user. In addition, it provides a function to perform k-means and hierarchical clustering with different distance measures and aggregation methods. Any combination of validation indices and clustering methods can be requested in a single function call. This enables the user to simultaneously evaluate several clustering schemes while varying the number of clusters, to help determining the most appropriate number of clusters for the data set of interest.
引用
收藏
页码:1 / 36
页数:36
相关论文
共 50 条
  • [1] Effects of Resampling in Determining the Number of Clusters in a Data Set
    Rainer Dangl
    Friedrich Leisch
    Journal of Classification, 2020, 37 : 558 - 583
  • [2] Effects of Resampling in Determining the Number of Clusters in a Data Set
    Dangl, Rainer
    Leisch, Friedrich
    JOURNAL OF CLASSIFICATION, 2020, 37 (03) : 558 - 583
  • [3] AN EXAMINATION OF PROCEDURES FOR DETERMINING THE NUMBER OF CLUSTERS IN A DATA SET
    MILLIGAN, GW
    COOPER, MC
    PSYCHOMETRIKA, 1985, 50 (02) : 159 - 179
  • [4] A new validation index for determining the number of clusters in a data set
    Sun, HJ
    Wang, SG
    Jiang, QS
    IJCNN'01: INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS, VOLS 1-4, PROCEEDINGS, 2001, : 1852 - 1857
  • [5] A GRAPH-THEORETIC CRITERION FOR DETERMINING THE NUMBER OF CLUSTERS IN A DATA SET
    KROLAKSCHWERDT, S
    ECKES, T
    MULTIVARIATE BEHAVIORAL RESEARCH, 1992, 27 (04) : 541 - 565
  • [6] A new similarity measure and its use in determining the number of clusters in a multivariate data set
    Vassiliou, A
    Tambouratzis, DG
    Koutras, MV
    Bersimis, S
    COMMUNICATIONS IN STATISTICS-THEORY AND METHODS, 2004, 33 (07) : 1643 - 1666
  • [7] Automatically Determining the Number of Clusters in Unlabeled Data Sets
    Wang, Liang
    Leckie, Christopher
    Ramamohanarao, Kotagiri
    Bezdek, James
    IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2009, 21 (03) : 335 - 350
  • [8] ICGE: an R package for detecting relevant clusters and atypical units in gene expression
    Irigoien, Itziar
    Sierra, Basilio
    Arenas, Concepcion
    BMC BIOINFORMATICS, 2012, 13
  • [9] ICGE: an R package for detecting relevant clusters and atypical units in gene expression
    Itziar Irigoien
    Basilio Sierra
    Concepcion Arenas
    BMC Bioinformatics, 13
  • [10] ON CLUSTER VALIDATION FOR DETECTING THE NUMBER OF CLUSTERS IN A DATA SET
    Albalate, Amparo
    Suendermann, David
    Minker, Wolfgang
    INTERNATIONAL JOURNAL ON ARTIFICIAL INTELLIGENCE TOOLS, 2011, 20 (05) : 941 - 953