An examination of indexes for determining the number of clusters in binary data sets

被引:0
|
作者
Evgenia Dimitriadou
Sara Dolničar
Andreas Weingessel
机构
[1] Technische Universität Wien,Institut für Statistik und Wahrscheinlichkeitstheorie
[2] Wirtschaftsuniversität wien,Institut für Tourismus und Freizeitwirtschaft
来源
Psychometrika | 2002年 / 67卷
关键词
number of clusters; clustering indexes; binary data; artificial data sets; market segmentation;
D O I
暂无
中图分类号
学科分类号
摘要
The problem of choosing the correct number of clusters is as old as cluster analysis itself. A number of authors have suggested various indexes to facilitate this crucial decision. One of the most extensive comparative studies of indexes was conducted by Milligan and Cooper (1985). The present piece of work pursues the same goal under different conditions. In contrast to Milligan and Cooper's work, the emphasis here is on high-dimensional empirical binary data. Binary artificial data sets are constructed to reflect features typically encountered in real-world data situations in the field of marketing research. The simulation includes 162 binary data sets that are clustered by two different algorithms and lead to recommendations on the number of clusters for each index under consideration. Index results are evaluated and their performance is compared and analyzed.
引用
收藏
页码:137 / 159
页数:22
相关论文
共 50 条
  • [21] Local and Global Data Spread Based Index for Determining Number of Clusters in a Dataset
    Riyaz, Romana
    Wani, M. Arif
    2016 15TH IEEE INTERNATIONAL CONFERENCE ON MACHINE LEARNING AND APPLICATIONS (ICMLA 2016), 2016, : 651 - 656
  • [22] Determining the Optimal Number of Clusters using Silhouette Score as a Data Mining Technique
    Januzaj, Ylber
    Beqiri, Edmond
    Luma, Artan
    INTERNATIONAL JOURNAL OF ONLINE AND BIOMEDICAL ENGINEERING, 2023, 19 (04) : 174 - 182
  • [23] Indexes to Find the Optimal Number of Clusters in a Hierarchical Clustering
    David Martin-Fernandez, Jose
    Maria Luna-Romera, Jose
    Pontes, Beatriz
    Riquelme-Santos, Jose C.
    14TH INTERNATIONAL CONFERENCE ON SOFT COMPUTING MODELS IN INDUSTRIAL AND ENVIRONMENTAL APPLICATIONS (SOCO 2019), 2020, 950 : 3 - 13
  • [24] Finding the number of natural clusters in groundwater data sets using the concept of equivalence class
    Pacheco, FAL
    COMPUTERS & GEOSCIENCES, 1998, 24 (01) : 7 - 15
  • [25] Recovering the number of clusters in data sets with noise features using feature rescaling factors
    de Amorim, Renato Cordeiro
    Hennig, Christian
    INFORMATION SCIENCES, 2015, 324 : 126 - 145
  • [26] DETERMINING APPROPRIATE GROUP NUMBER AND COMPOSITION FOR DATA SETS CONTAINING REPEATED CHECK CULTIVARS
    BULL, JK
    BASFORD, KE
    DELACY, IH
    COOPER, M
    FIELD CROPS RESEARCH, 1993, 31 (3-4) : 369 - 383
  • [27] Determining the number of signals correlated across multiple data sets for small sample support
    Song, Yang
    Hasija, Tanuj
    Schreier, Peter J.
    Ramirez, David
    2016 24TH EUROPEAN SIGNAL PROCESSING CONFERENCE (EUSIPCO), 2016, : 1528 - 1532
  • [28] DETERMINING THE OPTIMAL NUMBER OF CLUSTERS IN CLUSTER ANALYSIS
    Loster, Tomas
    10TH INTERNATIONAL DAYS OF STATISTICS AND ECONOMICS, 2016, : 1078 - 1090
  • [29] A new similarity measure and its use in determining the number of clusters in a multivariate data set
    Vassiliou, A
    Tambouratzis, DG
    Koutras, MV
    Bersimis, S
    COMMUNICATIONS IN STATISTICS-THEORY AND METHODS, 2004, 33 (07) : 1643 - 1666
  • [30] A Method for Automatically Determining The Number of Clusters of LAC
    Liu, Han
    Wu, Qingfeng
    Dong, Huailin
    Wang, Shuangshuang
    Cai, Qing
    Ma, Zhuo
    ICCSSE 2009: PROCEEDINGS OF 2009 4TH INTERNATIONAL CONFERENCE ON COMPUTER SCIENCE & EDUCATION, 2009, : 1907 - +