Identifying connected components in Gaussian finite mixture models for clustering

被引:21
|
作者
Scrucca, Luca [1 ]
机构
[1] Univ Perugia, Dept Econ, I-06123 Perugia, Italy
关键词
Finite mixture of Gaussian distributions; Cluster analysis; Connected components; High density regions; Cluster cores; MAXIMUM-LIKELIHOOD; INCOMPLETE DATA; DENSITY; SELECTION; NUMBER; TREE;
D O I
10.1016/j.csda.2015.01.006
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
Model-based clustering associates each component of a finite mixture distribution to a group or cluster. Therefore, an underlying implicit assumption is that a one-to-one correspondence exists between mixture components and clusters. In applications with multivariate continuous data, finite mixtures of Gaussian distributions are typically used. Information criteria, such as BIC, are often employed to select the number of mixture components. However, a single Gaussian density may not be sufficient, and two or more mixture components could be needed to reasonably approximate the distribution within a homogeneous group of observations. A clustering method, based on the identification of high density regions of the underlying density function, is introduced. Starting with an estimated Gaussian finite mixture model, the corresponding density estimate is used to identify the cluster cores, i.e. those data points which form the core of the clusters. Then, the remaining observations are allocated to those cluster cores for which the probability of cluster membership is the highest. The method is illustrated using both simulated and real data examples, which show how the proposed approach improves the identification of non-Gaussian clusters compared to a fully parametric approach. Furthermore, it enables the identification of clusters which cannot be obtained by merging mixture components, and it can be straightforwardly extended to cases of higher dimensionality. (C) 2015 Elsevier B.V. All rights reserved.
引用
收藏
页码:5 / 17
页数:13
相关论文
共 50 条
  • [41] Simultaneous estimation and clustering with finite mixture of nonparanormal graphical models
    Aghabozorgi, Hamid Haji
    Eskandari, Farzad
    COMMUNICATIONS IN STATISTICS-SIMULATION AND COMPUTATION, 2025, 54 (03) : 709 - 733
  • [42] Finite mixture models and model-based clusteringFinite mixture models and model-based clustering
    Melnykov, Volodymyr
    Maitra, Ranjan
    STATISTICS SURVEYS, 2010, 4 : 80 - 116
  • [43] Initializing the EM algorithm in Gaussian mixture models with an unknown number of components
    Melnykov, Volodymyr
    Melnykov, Igor
    COMPUTATIONAL STATISTICS & DATA ANALYSIS, 2012, 56 (06) : 1381 - 1395
  • [44] Application of principal components analysis and Gaussian mixture models to printer identification
    Ali, GN
    Mikkilineni, AK
    Delp, EJ
    Allebach, JP
    IS&T'S NIP20: INTERNATIONAL CONFERENCE ON DIGITAL PRINTING TECHNOLOGIES, PROCEEDINGS, 2004, : 301 - 305
  • [45] Estimating the number of components in Gaussian mixture models adaptively for medical image
    Xie, Cong-Hua
    Chang, Jin-Yi
    Liu, Yong-Jun
    OPTIK, 2013, 124 (23): : 6216 - 6221
  • [46] IDENTIFYING CORRELATED COMPONENTS IN HIGH-DIMENSIONAL MULTIVARIATE GAUSSIAN MODELS
    Geng, Jun
    Xu, Weiyu
    Lai, Lifeng
    2017 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2017, : 6424 - 6428
  • [47] ESTIMATION OF THE NUMBER OF COMPONENTS OF NONPARAMETRIC MULTIVARIATE FINITE MIXTURE MODELS
    Kwon, Caleb
    Mbakop, Eric
    ANNALS OF STATISTICS, 2021, 49 (04): : 2178 - 2205
  • [48] Finite mixture models do not reliably learn the number of components
    Cai, Diana
    Campbell, Trevor
    Broderick, Tamara
    INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 139, 2021, 139
  • [49] STOCHASTIC PROPERTIES OF GENERALIZED FINITE MIXTURE MODELS WITH DEPENDENT COMPONENTS
    Amini-Seresht, Ebrahim
    Balakrishnan, Narayanaswamy
    JOURNAL OF APPLIED PROBABILITY, 2021, 58 (03) : 794 - 804
  • [50] Tests of covariate effects under finite Gaussian mixture regression models
    Gan, Chong
    Chen, Jiahua
    Feng, Zeny
    JOURNAL OF APPLIED STATISTICS, 2024,