Identifying connected components in Gaussian finite mixture models for clustering

被引:21
|
作者
Scrucca, Luca [1 ]
机构
[1] Univ Perugia, Dept Econ, I-06123 Perugia, Italy
关键词
Finite mixture of Gaussian distributions; Cluster analysis; Connected components; High density regions; Cluster cores; MAXIMUM-LIKELIHOOD; INCOMPLETE DATA; DENSITY; SELECTION; NUMBER; TREE;
D O I
10.1016/j.csda.2015.01.006
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
Model-based clustering associates each component of a finite mixture distribution to a group or cluster. Therefore, an underlying implicit assumption is that a one-to-one correspondence exists between mixture components and clusters. In applications with multivariate continuous data, finite mixtures of Gaussian distributions are typically used. Information criteria, such as BIC, are often employed to select the number of mixture components. However, a single Gaussian density may not be sufficient, and two or more mixture components could be needed to reasonably approximate the distribution within a homogeneous group of observations. A clustering method, based on the identification of high density regions of the underlying density function, is introduced. Starting with an estimated Gaussian finite mixture model, the corresponding density estimate is used to identify the cluster cores, i.e. those data points which form the core of the clusters. Then, the remaining observations are allocated to those cluster cores for which the probability of cluster membership is the highest. The method is illustrated using both simulated and real data examples, which show how the proposed approach improves the identification of non-Gaussian clusters compared to a fully parametric approach. Furthermore, it enables the identification of clusters which cannot be obtained by merging mixture components, and it can be straightforwardly extended to cases of higher dimensionality. (C) 2015 Elsevier B.V. All rights reserved.
引用
收藏
页码:5 / 17
页数:13
相关论文
共 50 条
  • [21] GAUSSIAN MIXTURE MODELS FOR CLUSTERING AND CALIBRATION OF ENSEMBLE WEATHER FORECASTS
    Jouan, Gabriel
    Cuzol, Anne
    Monbet, Valerie
    Monnier, Goulven
    DISCRETE AND CONTINUOUS DYNAMICAL SYSTEMS-SERIES S, 2023, 16 (02): : 309 - 328
  • [22] Fast clustering of GARCH processes via Gaussian mixture models
    Aielli, Gian Piero
    Caporin, Massimiliano
    MATHEMATICS AND COMPUTERS IN SIMULATION, 2013, 94 : 205 - 222
  • [23] Unsupervised clustering using nonparametric finite mixture models
    Hunter, David R. R.
    WILEY INTERDISCIPLINARY REVIEWS-COMPUTATIONAL STATISTICS, 2024, 16 (01)
  • [24] Clustering via finite nonparametric ICA mixture models
    Zhu, Xiaotian
    Hunter, David R.
    ADVANCES IN DATA ANALYSIS AND CLASSIFICATION, 2019, 13 (01) : 65 - 87
  • [25] Discrete data clustering using finite mixture models
    Bouguila, Nizar
    ElGuebaly, Walid
    PATTERN RECOGNITION, 2009, 42 (01) : 33 - 42
  • [26] Clustering via finite nonparametric ICA mixture models
    Xiaotian Zhu
    David R. Hunter
    Advances in Data Analysis and Classification, 2019, 13 : 65 - 87
  • [27] Martingale Posterior Inference for Finite Mixture Models and Clustering
    Rodriguez, Carlos E.
    Mena, Ramses H.
    Walker, Stephen G.
    JOURNAL OF COMPUTATIONAL AND GRAPHICAL STATISTICS, 2025,
  • [28] ON THE CONNECTED COMPONENTS OF MODULI SPACES OF FINITE FLAT MODELS
    Imai, Naoki
    AMERICAN JOURNAL OF MATHEMATICS, 2010, 132 (05) : 1189 - 1204
  • [29] Non-Gaussian Data Clustering via Expectation Propagation Learning of Finite Dirichlet Mixture Models and Applications
    Fan, Wentao
    Bouguila, Nizar
    NEURAL PROCESSING LETTERS, 2014, 39 (02) : 115 - 135
  • [30] Non-Gaussian Data Clustering via Expectation Propagation Learning of Finite Dirichlet Mixture Models and Applications
    Wentao Fan
    Nizar Bouguila
    Neural Processing Letters, 2014, 39 : 115 - 135