Model-Based Clustering of Mixed Data With Sparse Dependence

被引:0
|
作者
Choi, Young-Geun [1 ]
Ahn, Soohyun [2 ]
Kim, Jayoun [3 ]
机构
[1] Sungkyunkwan Univ, Dept Math Educ, Seoul 03063, South Korea
[2] Ajou Univ, Dept Math, Suwon 16499, Gyeonggi Do, South Korea
[3] Seoul Natl Univ Hosp, Med Res Collaborating Ctr, Seoul 03080, South Korea
基金
新加坡国家研究基金会;
关键词
INDEX TERMS Latent Gaussian mixture model; maximum likelihood; model-based clustering; Monte Carlo expectation-maximization algorithm; MIXTURE; BINARY;
D O I
10.1109/ACCESS.2023.3296790
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Mixed data refers to a mixture of continuous and categorical variables. The clustering problem with mixed data is a long-standing statistical problem. The latent Gaussian mixture model, a model-based approach for such a problem, has received attention owing to its simplicity and interpretability. However, these approaches are prone to dimensionality problems. Specifically, parameters must be estimated for each group, and the number of covariance parameters is quadratic in the number of variables. To address this, we propose "regClustMD," a novel model-based clustering method that can address sparse dependence among variables. We consider a sparse latent Gaussian mixture model, assuming that the precision matrix between variables has sparse nonzero elements. We propose maximizing a penalized complete log-likelihood using the Monte Carlo expectation-maximization (MCEM) algorithm. Our numerical experiments and real data analyses demonstrated that our method outperformed a counterpart algorithm in both accuracy and failure rate under the correlated data structure.
引用
收藏
页码:75945 / 75954
页数:10
相关论文
共 50 条
  • [1] Model-based clustering of Gaussian copulas for mixed data
    Marbac, Matthieu
    Biernacki, Christophe
    Vandewalle, Vincent
    [J]. COMMUNICATIONS IN STATISTICS-THEORY AND METHODS, 2017, 46 (23) : 11635 - 11656
  • [2] Model-based co-clustering for the effective handling of sparse data
    Ailem, Melissa
    Role, Francois
    Nadif, Mohamed
    [J]. PATTERN RECOGNITION, 2017, 72 : 108 - 122
  • [3] Model-based Co-clustering for High Dimensional Sparse Data
    Salah, Aghiles
    Rogovschi, Nicoleta
    Nadif, Mohamed
    [J]. ARTIFICIAL INTELLIGENCE AND STATISTICS, VOL 51, 2016, 51 : 866 - 874
  • [4] Model-based co-clustering for mixed type data
    Selosse, Margot
    Jacques, Julien
    Biernacki, Christophe
    [J]. COMPUTATIONAL STATISTICS & DATA ANALYSIS, 2020, 144
  • [5] Model-based clustering with sparse covariance matrices
    Michael Fop
    Thomas Brendan Murphy
    Luca Scrucca
    [J]. Statistics and Computing, 2019, 29 : 791 - 819
  • [6] Vertex finding by sparse model-based clustering
    Fruehwirth, R.
    Eckstein, K.
    Fruehwirth-Schnatter, S.
    [J]. 17TH INTERNATIONAL WORKSHOP ON ADVANCED COMPUTING AND ANALYSIS TECHNIQUES IN PHYSICS RESEARCH (ACAT2016), 2016, 762
  • [7] Model-based clustering with sparse covariance matrices
    Fop, Michael
    Murphy, Thomas Brendan
    Scrucca, Luca
    [J]. STATISTICS AND COMPUTING, 2019, 29 (04) : 791 - 819
  • [8] Model-based clustering, classification, and discriminant analysis of data with mixed type
    Browne, Ryan P.
    McNicholas, Paul D.
    [J]. JOURNAL OF STATISTICAL PLANNING AND INFERENCE, 2012, 142 (11) : 2976 - 2984
  • [9] Model-based clustering based on sparse finite Gaussian mixtures
    Gertraud Malsiner-Walli
    Sylvia Frühwirth-Schnatter
    Bettina Grün
    [J]. Statistics and Computing, 2016, 26 : 303 - 324
  • [10] Model-based clustering based on sparse finite Gaussian mixtures
    Malsiner-Walli, Gertraud
    Fruehwirth-Schnatter, Sylvia
    Gruen, Bettina
    [J]. STATISTICS AND COMPUTING, 2016, 26 (1-2) : 303 - 324