Identifying connected components in Gaussian finite mixture models for clustering

被引:21
|
作者
Scrucca, Luca [1 ]
机构
[1] Univ Perugia, Dept Econ, I-06123 Perugia, Italy
关键词
Finite mixture of Gaussian distributions; Cluster analysis; Connected components; High density regions; Cluster cores; MAXIMUM-LIKELIHOOD; INCOMPLETE DATA; DENSITY; SELECTION; NUMBER; TREE;
D O I
10.1016/j.csda.2015.01.006
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
Model-based clustering associates each component of a finite mixture distribution to a group or cluster. Therefore, an underlying implicit assumption is that a one-to-one correspondence exists between mixture components and clusters. In applications with multivariate continuous data, finite mixtures of Gaussian distributions are typically used. Information criteria, such as BIC, are often employed to select the number of mixture components. However, a single Gaussian density may not be sufficient, and two or more mixture components could be needed to reasonably approximate the distribution within a homogeneous group of observations. A clustering method, based on the identification of high density regions of the underlying density function, is introduced. Starting with an estimated Gaussian finite mixture model, the corresponding density estimate is used to identify the cluster cores, i.e. those data points which form the core of the clusters. Then, the remaining observations are allocated to those cluster cores for which the probability of cluster membership is the highest. The method is illustrated using both simulated and real data examples, which show how the proposed approach improves the identification of non-Gaussian clusters compared to a fully parametric approach. Furthermore, it enables the identification of clusters which cannot be obtained by merging mixture components, and it can be straightforwardly extended to cases of higher dimensionality. (C) 2015 Elsevier B.V. All rights reserved.
引用
收藏
页码:5 / 17
页数:13
相关论文
共 50 条
  • [1] Prediction of random effects in finite mixture models with Gaussian components
    Gianola, D
    JOURNAL OF ANIMAL BREEDING AND GENETICS, 2005, 122 (03) : 145 - 160
  • [2] Identifying the number of components in Gaussian mixture models using numerical algebraic geometry
    Shirinkam, Sara
    Alaeddini, Adel
    Gross, Elizabeth
    JOURNAL OF ALGEBRA AND ITS APPLICATIONS, 2020, 19 (11)
  • [3] On determining efficient finite mixture models with compact and essential components for clustering data
    Abas, Ahmed R.
    EGYPTIAN INFORMATICS JOURNAL, 2013, 14 (01) : 79 - 88
  • [4] Variable Selection for Clustering with Gaussian Mixture Models
    Maugis, Cathy
    Celeux, Gilles
    Martin-Magniette, Marie-Laure
    BIOMETRICS, 2009, 65 (03) : 701 - 709
  • [5] mclust 5: Clustering, Classification and Density Estimation Using Gaussian Finite Mixture Models
    Scrucca, Luca
    Fop, Michael
    Murphy, T. Brendan
    Raftery, Adrian E.
    R JOURNAL, 2016, 8 (01): : 289 - 317
  • [6] Finite mixture models with negative components
    Zhang, BB
    Zhang, CS
    MACHINE LEARNING AND DATA MINING IN PATTERN RECOGNITION, PROCEEDINDS, 2005, 3587 : 31 - 41
  • [7] Nonparametric Finite Mixture of Gaussian Graphical Models
    Lee, Kevin H.
    Xue, Lingzhou
    TECHNOMETRICS, 2018, 60 (04) : 511 - 521
  • [8] Imputation through finite Gaussian mixture models
    Di Zio, Marco
    Guarnera, Ugo
    Luzi, Orietta
    COMPUTATIONAL STATISTICS & DATA ANALYSIS, 2007, 51 (11) : 5305 - 5316
  • [9] A robust EM clustering algorithm for Gaussian mixture models
    Yang, Miin-Shen
    Lai, Chien-Yo
    Lin, Chih-Ying
    PATTERN RECOGNITION, 2012, 45 (11) : 3950 - 3961
  • [10] Reinforced EM Algorithm for Clustering with Gaussian Mixture Models
    Tobin, Joshua
    Ho, Chin Pang
    Zhang, Mimi
    PROCEEDINGS OF THE 2023 SIAM INTERNATIONAL CONFERENCE ON DATA MINING, SDM, 2023, : 118 - 126