Identifying connected components in Gaussian finite mixture models for clustering

被引:21
|
作者
Scrucca, Luca [1 ]
机构
[1] Univ Perugia, Dept Econ, I-06123 Perugia, Italy
关键词
Finite mixture of Gaussian distributions; Cluster analysis; Connected components; High density regions; Cluster cores; MAXIMUM-LIKELIHOOD; INCOMPLETE DATA; DENSITY; SELECTION; NUMBER; TREE;
D O I
10.1016/j.csda.2015.01.006
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
Model-based clustering associates each component of a finite mixture distribution to a group or cluster. Therefore, an underlying implicit assumption is that a one-to-one correspondence exists between mixture components and clusters. In applications with multivariate continuous data, finite mixtures of Gaussian distributions are typically used. Information criteria, such as BIC, are often employed to select the number of mixture components. However, a single Gaussian density may not be sufficient, and two or more mixture components could be needed to reasonably approximate the distribution within a homogeneous group of observations. A clustering method, based on the identification of high density regions of the underlying density function, is introduced. Starting with an estimated Gaussian finite mixture model, the corresponding density estimate is used to identify the cluster cores, i.e. those data points which form the core of the clusters. Then, the remaining observations are allocated to those cluster cores for which the probability of cluster membership is the highest. The method is illustrated using both simulated and real data examples, which show how the proposed approach improves the identification of non-Gaussian clusters compared to a fully parametric approach. Furthermore, it enables the identification of clusters which cannot be obtained by merging mixture components, and it can be straightforwardly extended to cases of higher dimensionality. (C) 2015 Elsevier B.V. All rights reserved.
引用
收藏
页码:5 / 17
页数:13
相关论文
共 50 条
  • [31] Bayesian learning of finite generalized Gaussian mixture models on images
    Elguebaly, Tarek
    Bouguila, Nizar
    SIGNAL PROCESSING, 2011, 91 (04) : 801 - 820
  • [32] Empirical Bayes estimation utilizing finite Gaussian Mixture Models
    Orellana, Rafael
    Carvajal, Rodrigo
    Aguero, Juan C.
    2019 IEEE CHILEAN CONFERENCE ON ELECTRICAL, ELECTRONICS ENGINEERING, INFORMATION AND COMMUNICATION TECHNOLOGIES (CHILECON), 2019,
  • [33] Image Restoration Based on Improved Patch Clustering in Gaussian Mixture Models
    Qiu, Guoqing
    Wei, Yating
    Yang, Haijing
    Wang, Yantao
    Luo, Pan
    2019 CHINESE AUTOMATION CONGRESS (CAC2019), 2019, : 4502 - 4505
  • [34] Meta-learning representations for clustering with infinite Gaussian mixture models
    Iwata, Tomoharu
    NEUROCOMPUTING, 2023, 549
  • [35] Clustering protein sequence and structure space with infinite Gaussian mixture models
    Dubey, A
    Hwang, S
    Rangel, C
    Rasmussen, CE
    Ghahramani, Z
    Wild, DL
    PACIFIC SYMPOSIUM ON BIOCOMPUTING 2004, 2003, : 399 - 410
  • [36] Model based clustering of audio clips using Gaussian mixture models
    Chandrakala, S.
    Sekhar, C. Chandra
    ICAPR 2009: SEVENTH INTERNATIONAL CONFERENCE ON ADVANCES IN PATTERN RECOGNITION, PROCEEDINGS, 2009, : 47 - 50
  • [37] Vine copula mixture models and clustering for non-Gaussian data
    Sahin, Ozge
    Czado, Claudia
    ECONOMETRICS AND STATISTICS, 2022, 22 : 136 - 158
  • [38] A novel line scan clustering algorithm for identifying connected components in digital images
    Yang, Y
    Zhang, D
    IMAGE AND VISION COMPUTING, 2003, 21 (05) : 459 - 472
  • [39] Robust clustering approach to fuzzy Gaussian mixture models for speaker identification
    Tran, Dat
    Wagner, Michael
    International Conference on Knowledge-Based Intelligent Electronic Systems, Proceedings, KES, 1999, : 337 - 340
  • [40] On the distribution of posterior probabilities in finite mixture models with application in clustering
    Melnykov, Volodymyr
    JOURNAL OF MULTIVARIATE ANALYSIS, 2013, 122 : 175 - 189