Automatically Determining the Number of Clusters in Unlabeled Data Sets

被引：62

作者：

Wang, Liang ^{[1
]}

Leckie, Christopher ^{[1
]}

Ramamohanarao, Kotagiri ^{[1
]}

Bezdek, James ^{[1
]}

机构：

[1] Univ Melbourne, Dept Comp Sci & Software Engn, Parkville, Vic 3010, Australia

来源：

IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING | 2009年 / 21卷 / 03期

基金：

澳大利亚研究理事会;

关键词：

Clustering; cluster tendency; reordered dissimilarity image; VAT; VISUAL ASSESSMENT; TENDENCY; SELECTION; AID;

D O I：

10.1109/TKDE.2008.158

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

One of the major problems in cluster analysis is the determination of the number of clusters in unlabeled data, which is a basic input for most clustering algorithms. In this paper, we investigate a new method called Dark Block Extraction (DBE) for automatically estimating the number of clusters in unlabeled data sets, which is based on an existing algorithm for Visual Assessment of Cluster Tendency (VAT) of a data set, using several common image and signal processing techniques. Its basic steps include 1) generating a VAT image of an input dissimilarity matrix, 2) performing image segmentation on the VAT image to obtain a binary image, followed by directional morphological filtering, 3) applying a distance transform to the filtered binary image and projecting the pixel values onto the main diagonal axis of the image to form a projection signal, and 4) smoothing the projection signal, computing its first-order derivative, and then detecting major peaks and valleys in the resulting signal to decide the number of clusters. Our DBE method is nearly "automatic," depending on just one easy-to-set parameter. Several numerical and real-world examples are presented to illustrate the effectiveness of DBE.

引用

页码：335 / 350

页数：16

共 50 条

[31] A heuristic approach to classifying labeled/unlabeled data sets
Huang, K. Y.
JOURNAL OF THE OPERATIONAL RESEARCH SOCIETY, 2012, 63 (09) : 1248 - 1257
[32] Estimating the number of clusters in microarray data sets based on an information theoretic criterion
Nicorici, Daniel
Astola, Jaakko
Yli-Harja, Olli
2005 IEEE/SP 13TH WORKSHOP ON STATISTICAL SIGNAL PROCESSING (SSP), VOLS 1 AND 2, 2005, : 936 - 940
[33] Determining the number of clusters in cluster analysis
My-Young Cheong
Hakbae Lee
Journal of the Korean Statistical Society, 2008, 37 : 135 - 143
[34] Deep Embedding for Determining the Number of Clusters
Wang, Yiqi
Shi, Zhan
Guo, Xifeng
Liu, Xinwang
Zhu, En
Yin, Jianping
THIRTY-SECOND AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE / THIRTIETH INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE CONFERENCE / EIGHTH AAAI SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2018, : 8173 - 8174
[35] Determining the number of clusters by sampling with replacement
Tonidandel, S
Overall, JE
PSYCHOLOGICAL METHODS, 2004, 9 (02) : 238 - 249
[36] Fuzzy Clustering: Determining the Number of Clusters
Rezankova, Hana
Husek, Dusan
2012 FOURTH INTERNATIONAL CONFERENCE ON COMPUTATIONAL ASPECTS OF SOCIAL NETWORKS (CASON), 2012, : 277 - 282
[37] Determining the number of clusters in cluster analysis
Cheong, My-Young
Lee, Hakbae
JOURNAL OF THE KOREAN STATISTICAL SOCIETY, 2008, 37 (02) : 135 - 143
[38] A NEW APPROACH FOR DETERMINING NUMBER OF CLUSTERS
Erisoglu, Murat
Erisoglu, Ulku
Servi, Tayfun
Sakallioglu, Sadullah
PAKISTAN JOURNAL OF STATISTICS, 2012, 28 (01): : 141 - 158
[39] Local and Global Data Spread Based Index for Determining Number of Clusters in a Dataset
Riyaz, Romana
Wani, M. Arif
2016 15TH IEEE INTERNATIONAL CONFERENCE ON MACHINE LEARNING AND APPLICATIONS (ICMLA 2016), 2016, : 651 - 656
[40] Determining the Optimal Number of Clusters using Silhouette Score as a Data Mining Technique
Januzaj, Ylber
Beqiri, Edmond
Luma, Artan
INTERNATIONAL JOURNAL OF ONLINE AND BIOMEDICAL ENGINEERING, 2023, 19 (04) : 174 - 182

← 1 2 3 4 5 →