Penalized model-based clustering with unconstrained covariance matrices

被引:52
|
作者
Zhou, Hui [1 ]
Pan, Wei [1 ]
Shen, Xiaotong [2 ]
机构
[1] Univ Minnesota, Sch Publ Hlth, Div Biostat, Minneapolis, MN 55455 USA
[2] Univ Minnesota, Sch Stat, Minneapolis, MN 55455 USA
来源
关键词
Covariance estimation; EM algorithm; Gaussian graphical models; high-dimension but low-sample size; L-1; penalization; normal mixtures; penalized likelihood; semi-supervised learning; VARIABLE SELECTION; MOLECULAR CLASSIFICATION; CANCER CLASSIFICATION; MICROARRAY DATA; MIXTURE MODEL; INFORMATION; REGRESSION;
D O I
10.1214/09-EJS487
中图分类号
O21 [概率论与数理统计]; C8 [统计学];
学科分类号
020208 ; 070103 ; 0714 ;
摘要
Clustering is one of the most useful tools for high-dimensional analysis, e.g., for microarray data. It becomes challenging in presence of a large number of noise variables, which may mask underlying clustering structures. Therefore, noise removal through variable selection is necessary. One effective way is regularization for simultaneous parameter estimation and variable selection in model-based clustering. However, existing methods focus on regularizing the mean parameters representing centers of clusters, ignoring dependencies among variables within clusters, leading to incorrect orientations or shapes of the resulting clusters. In this article, we propose a regularized Gaussian mixture model with general covariance matrices, taking various dependencies into account. At the same time, this approach shrinks the means and covariance matrices, achieving better clustering and variable selection. To overcome one technical challenge in estimating possibly large covariance matrices, we derive an E-M algorithm to utilize the graphical lasso (Friedman et al. 2007) for parameter estimation. Numerical examples, including applications to microarray gene expression data, demonstrate the utility of the proposed method.
引用
收藏
页码:1473 / 1496
页数:24
相关论文
共 50 条
  • [1] Model-based clustering with sparse covariance matrices
    Michael Fop
    Thomas Brendan Murphy
    Luca Scrucca
    Statistics and Computing, 2019, 29 : 791 - 819
  • [2] Penalized model-based clustering with cluster-specific diagonal covariance matrices and grouped variables
    Xie, Benhuai
    Pan, Wei
    Shen, Xiaotong
    ELECTRONIC JOURNAL OF STATISTICS, 2008, 2 : 168 - 212
  • [3] Model-based clustering with sparse covariance matrices
    Fop, Michael
    Murphy, Thomas Brendan
    Scrucca, Luca
    STATISTICS AND COMPUTING, 2019, 29 (04) : 791 - 819
  • [4] Penalized model-based clustering of fMRI data
    Dilernia, Andrew
    Quevedo, Karina
    Camchong, Jazmin
    Lim, Kelvin
    Pan, Wei
    Zhang, Lin
    BIOSTATISTICS, 2022, 23 (03) : 825 - 843
  • [5] Model-based principal components of covariance matrices
    Boik, Robert J.
    Panishkan, Kamolchanok
    Hyde, Scott K.
    BRITISH JOURNAL OF MATHEMATICAL & STATISTICAL PSYCHOLOGY, 2010, 63 (01): : 113 - 137
  • [6] Penalized model-based clustering with application to variable selection
    Pan, Wei
    JOURNAL OF MACHINE LEARNING RESEARCH, 2007, 8 : 1145 - 1164
  • [7] Penalized model-based clustering of complex functional data
    Nicola Pronello
    Rosaria Ignaccolo
    Luigi Ippoliti
    Sara Fontanella
    Statistics and Computing, 2023, 33
  • [8] Penalized model-based clustering of complex functional data
    Pronello, Nicola
    Ignaccolo, Rosaria
    Ippoliti, Luigi
    Fontanella, Sara
    STATISTICS AND COMPUTING, 2023, 33 (06)
  • [9] Rival penalized competitive learning for model-based sequence clustering
    Law, MH
    Kwok, JT
    15TH INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION, VOL 2, PROCEEDINGS: PATTERN RECOGNITION AND NEURAL NETWORKS, 2000, : 195 - 198
  • [10] Adaptive Model-Based Decomposition of Polarimetric SAR Covariance Matrices
    Arii, Motofumi
    van Zyl, Jakob J.
    Kim, Yunjin
    IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, 2011, 49 (03): : 1104 - 1113