Pairwise Variable Selection for High-Dimensional Model-Based Clustering

被引:46
|
作者
Guo, Jian [1 ]
Levina, Elizaveta [1 ]
Michailidis, George [1 ]
Zhu, Ji [1 ]
机构
[1] Univ Michigan, Dept Stat, Ann Arbor, MI 48109 USA
基金
美国国家科学基金会;
关键词
EM algorithm; Gaussian mixture models; Model-based clustering; Pairwise fusion; Variable selection; CLASSIFICATION; LIKELIHOOD; PREDICTION;
D O I
10.1111/j.1541-0420.2009.01341.x
中图分类号
Q [生物科学];
学科分类号
07 ; 0710 ; 09 ;
摘要
P>Variable selection for clustering is an important and challenging problem in high-dimensional data analysis. Existing variable selection methods for model-based clustering select informative variables in a "one-in-all-out" manner; that is, a variable is selected if at least one pair of clusters is separable by this variable and removed if it cannot separate any of the clusters. In many applications, however, it is of interest to further establish exactly which clusters are separable by each informative variable. To address this question, we propose a pairwise variable selection method for high-dimensional model-based clustering. The method is based on a new pairwise penalty. Results on simulated and real data show that the new method performs better than alternative approaches that use l(1) and l(infinity) penalties and offers better interpretation.
引用
收藏
页码:793 / 804
页数:12
相关论文
共 50 条
  • [1] Variable selection for model-based high-dimensional clustering
    Wang, Sijian
    Zhu, Ji
    [J]. PREDICTION AND DISCOVERY, 2007, 443 : 177 - +
  • [2] Model-based clustering of high-dimensional data: Variable selection versus facet determination
    Poon, Leonard K. M.
    Zhang, Nevin L.
    Liu, Tengfei
    Liu, April H.
    [J]. INTERNATIONAL JOURNAL OF APPROXIMATE REASONING, 2013, 54 (01) : 196 - 215
  • [3] Variable selection for model-based high-dimensional clustering and its application to microarray data
    Wang, Sijian
    Zhu, Ji
    [J]. BIOMETRICS, 2008, 64 (02) : 440 - 448
  • [4] High-dimensional variable selection with the plaid mixture model for clustering
    Chekouo, Thierry
    Murua, Alejandro
    [J]. COMPUTATIONAL STATISTICS, 2018, 33 (03) : 1475 - 1496
  • [5] High-dimensional variable selection with the plaid mixture model for clustering
    Thierry Chekouo
    Alejandro Murua
    [J]. Computational Statistics, 2018, 33 : 1475 - 1496
  • [6] Model-based clustering of high-dimensional data: A review
    Bouveyron, Charles
    Brunet-Saumard, Camille
    [J]. COMPUTATIONAL STATISTICS & DATA ANALYSIS, 2014, 71 : 52 - 78
  • [7] MODEL-BASED CLUSTERING OF HIGH-DIMENSIONAL DATA IN ASTROPHYSICS
    Bouveyron, C.
    [J]. STATISTICS FOR ASTROPHYSICS: CLUSTERING AND CLASSIFICATION, 2016, 77 : 91 - 119
  • [8] Variable selection for model-based clustering
    Raftery, AE
    Dean, N
    [J]. JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION, 2006, 101 (473) : 168 - 178
  • [9] Model-based multifacet clustering with high-dimensional omics applications
    Zong, Wei
    Li, Danyang
    Seney, Marianne L.
    Mcclung, Colleen A.
    Tseng, George C.
    [J]. BIOSTATISTICS, 2024,
  • [10] Bayesian variable selection in clustering high-dimensional data
    Tadesse, MG
    Sha, N
    Vannucci, M
    [J]. JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION, 2005, 100 (470) : 602 - 617