A hierarchical Gamma Mixture Model-based method for estimating the number of clusters in complex data

被引:5
|
作者
Azhar, Muhammad [1 ]
Huang, Joshua Zhexue [1 ,2 ]
Masud, Md Abdul [1 ]
Li, Mark Junjie [1 ,2 ]
Cui, Laizhong [1 ,2 ]
机构
[1] Shenzhen Univ, Coll Comp Sci & Software Engn, Shenzhen, Peoples R China
[2] Shenzhen Univ, Natl Engn Lab Big Data Syst Comp Technol, Shenzhen, Peoples R China
基金
中国国家自然科学基金;
关键词
Number of clusters; Initial cluster centers; Gamma Mixture Model (GMM); EM algorithm; Clustering algorithms; ALGORITHM; CENTERS; FIND;
D O I
10.1016/j.asoc.2019.105891
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
This paper proposes a new method for estimating the true number of clusters and initial cluster centers in a dataset with many clusters. The observation points are assigned to the data space to observe the clusters through the distributions of the distances between the observation points and the objects in the dataset. A Gamma Mixture Model (GMM) is built from a distance distribution to partition the dataset into subsets, and a GMM tree is obtained by recursively partitioning the dataset. From the leaves of the GMM tree, a set of initial cluster centers are identified and the true number of clusters is estimated. This method is implemented in the new GMM-Tree algorithm. Two GMM forest algorithms are further proposed to ensemble multiple GMM trees to handle high dimensional data with many clusters. The GMM-P-Forest algorithm builds GMM trees in parallel, whereas the GMM-S-Forest algorithm uses a sequential process to build a GMM forest. Experiments were conducted on 32 synthetic datasets and 15 real datasets to evaluate the performance of the new algorithms. The results have shown that the proposed algorithms outperformed the existing popular methods: Silhouette, Elbow and Gap Statistic, and the recent method I-nice in estimating the true number of clusters from high dimensional complex data. (C) 2020 Elsevier B.V. All rights reserved.
引用
收藏
页数:20
相关论文
共 50 条
  • [1] A Hierarchical Gamma Mixture Model-Based Method for Classification of High-Dimensional Data
    Azhar, Muhammad
    Li, Mark Junjie
    Huang, Joshua Zhexue
    [J]. ENTROPY, 2019, 21 (09)
  • [2] Multilook SAR Image Segmentation with an Unknown Number of Clusters Using a Gamma Mixture Model and Hierarchical Clustering
    Zhao, Quanhua
    Li, Xiaoli
    Li, Yu
    [J]. SENSORS, 2017, 17 (05)
  • [3] On a resampling approach for tests on the number of clusters with mixture model-based clustering of tissue samples
    McLachlan, GJ
    Khan, N
    [J]. JOURNAL OF MULTIVARIATE ANALYSIS, 2004, 90 (01) : 90 - 105
  • [4] A hybrid method for estimating the predominant number of clusters in a data set
    Al Shaqsi, Jamil
    Wang, Wenjia
    [J]. 2012 11TH INTERNATIONAL CONFERENCE ON MACHINE LEARNING AND APPLICATIONS (ICMLA 2012), VOL 2, 2012, : 569 - 573
  • [5] Model-Based Hierarchical Clustering for Categorical Data
    Alalyan, Fahdah
    Zamzami, Nuha
    Bouguila, Nizar
    [J]. 2019 IEEE 28TH INTERNATIONAL SYMPOSIUM ON INDUSTRIAL ELECTRONICS (ISIE), 2019, : 1424 - 1429
  • [6] A mixture model-based nonparametric approach to estimating a count distribution
    Chee, Chew-Seng
    [J]. COMPUTATIONAL STATISTICS & DATA ANALYSIS, 2017, 109 : 34 - 44
  • [7] A Finite Gamma Mixture Model-Based Discriminative Learning Frameworks
    Al-Osaimi, Faisal R.
    Bouguila, Nizar
    [J]. 2015 IEEE 14TH INTERNATIONAL CONFERENCE ON MACHINE LEARNING AND APPLICATIONS (ICMLA), 2015, : 819 - 824
  • [8] An Approach for Determining the Number of Clusters in a Model-Based Cluster Analysis
    Akogul, Serkan
    Erisoglu, Murat
    [J]. ENTROPY, 2017, 19 (09):
  • [9] An adaptive optimization method for estimating the number of components in a Gaussian mixture model
    Sun, Shuping
    Tong, Yaonan
    Zhang, Biqiang
    Yang, Bowen
    He, Peiguang
    Song, Wei
    Yang, Wenbo
    Wu, Yilin
    Liu, Guangyu
    [J]. JOURNAL OF COMPUTATIONAL SCIENCE, 2022, 64
  • [10] Estimating the number of clusters in DNA microarray data
    Bolshakova, N
    Azuaje, F
    [J]. METHODS OF INFORMATION IN MEDICINE, 2006, 45 (02) : 153 - 157