Automatically Determining the Number of Clusters in Unlabeled Data Sets

被引：62

作者：

Wang, Liang ^{[1
]}

Leckie, Christopher ^{[1
]}

Ramamohanarao, Kotagiri ^{[1
]}

Bezdek, James ^{[1
]}

机构：

[1] Univ Melbourne, Dept Comp Sci & Software Engn, Parkville, Vic 3010, Australia

来源：

IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING | 2009年 / 21卷 / 03期

基金：

澳大利亚研究理事会;

关键词：

Clustering; cluster tendency; reordered dissimilarity image; VAT; VISUAL ASSESSMENT; TENDENCY; SELECTION; AID;

D O I：

10.1109/TKDE.2008.158

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

One of the major problems in cluster analysis is the determination of the number of clusters in unlabeled data, which is a basic input for most clustering algorithms. In this paper, we investigate a new method called Dark Block Extraction (DBE) for automatically estimating the number of clusters in unlabeled data sets, which is based on an existing algorithm for Visual Assessment of Cluster Tendency (VAT) of a data set, using several common image and signal processing techniques. Its basic steps include 1) generating a VAT image of an input dissimilarity matrix, 2) performing image segmentation on the VAT image to obtain a binary image, followed by directional morphological filtering, 3) applying a distance transform to the filtered binary image and projecting the pixel values onto the main diagonal axis of the image to form a projection signal, and 4) smoothing the projection signal, computing its first-order derivative, and then detecting major peaks and valleys in the resulting signal to decide the number of clusters. Our DBE method is nearly "automatic," depending on just one easy-to-set parameter. Several numerical and real-world examples are presented to illustrate the effectiveness of DBE.

引用

页码：335 / 350

页数：16

共 50 条

[21] A new validation index for determining the number of clusters in a data set
Sun, HJ
Wang, SG
Jiang, QS
IJCNN'01: INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS, VOLS 1-4, PROCEEDINGS, 2001, : 1852 - 1857
[22] Determining the number of clusters using information entropy for mixed data
Liang, Jiye
Zhao, Xingwang
Li, Deyu
Cao, Fuyuan
Dang, Chuangyin
PATTERN RECOGNITION, 2012, 45 (06) : 2251 - 2265
[23] An expansion of X-means for automatically determining the optimal number of clusters : Progressive iterations of K-means and merging of the clusters
Ishioka, T
PROCEEDINGS OF THE IASTED INTERNATIONAL CONFERENCE ON COMPUTATIONAL INTELLIGENCE, 2005, : 91 - 96
[24] TESTING DATA SETS AUTOMATICALLY.
Rice, Lincoln P.
Rife, David C.
Vincent, George A.
1600, (54):
[25] Nbclust: An R Package for Determining the Relevant Number of Clusters in a Data Set
Charrad, Malika
Ghazzali, Nadia
Boiteau, Veronique
Niknafs, Azam
JOURNAL OF STATISTICAL SOFTWARE, 2014, 61 (06): : 1 - 36
[26] A GRAPH-THEORETIC CRITERION FOR DETERMINING THE NUMBER OF CLUSTERS IN A DATA SET
KROLAKSCHWERDT, S
ECKES, T
MULTIVARIATE BEHAVIORAL RESEARCH, 1992, 27 (04) : 541 - 565
[27] Automatically Finding the Number of Clusters Based on Simulated Annealing
杨政武
霍宏
方涛
JournalofShanghaiJiaotongUniversity(Science), 2017, 22 (02) : 139 - 147
[28] Automatically discovering the number of clusters in Web page datasets
Yao, ZM
Choi, B
DMIN '05: PROCEEDINGS OF THE 2005 INTERNATIONAL CONFERENCE ON DATA MINING, 2005, : 3 - 9
[29] Automatically finding the number of clusters based on simulated annealing
Yang Z.
Huo H.
Fang T.
Journal of Shanghai Jiaotong University (Science), 2017, 22 (2) : 139 - 147
[30] An improved RPCL algorithm for determining clustering number automatically
Yang, Jun
Jin, Lianwen
TENCON 2006 - 2006 IEEE REGION 10 CONFERENCE, VOLS 1-4, 2006, : 417 - +

← 1 2 3 4 5 →