Pairwise Variable Selection for High-Dimensional Model-Based Clustering

被引:46
|
作者
Guo, Jian [1 ]
Levina, Elizaveta [1 ]
Michailidis, George [1 ]
Zhu, Ji [1 ]
机构
[1] Univ Michigan, Dept Stat, Ann Arbor, MI 48109 USA
基金
美国国家科学基金会;
关键词
EM algorithm; Gaussian mixture models; Model-based clustering; Pairwise fusion; Variable selection; CLASSIFICATION; LIKELIHOOD; PREDICTION;
D O I
10.1111/j.1541-0420.2009.01341.x
中图分类号
Q [生物科学];
学科分类号
07 ; 0710 ; 09 ;
摘要
P>Variable selection for clustering is an important and challenging problem in high-dimensional data analysis. Existing variable selection methods for model-based clustering select informative variables in a "one-in-all-out" manner; that is, a variable is selected if at least one pair of clusters is separable by this variable and removed if it cannot separate any of the clusters. In many applications, however, it is of interest to further establish exactly which clusters are separable by each informative variable. To address this question, we propose a pairwise variable selection method for high-dimensional model-based clustering. The method is based on a new pairwise penalty. Results on simulated and real data show that the new method performs better than alternative approaches that use l(1) and l(infinity) penalties and offers better interpretation.
引用
收藏
页码:793 / 804
页数:12
相关论文
共 50 条
  • [41] Variable selection and estimation in high-dimensional models
    Horowitz, Joel L.
    [J]. CANADIAN JOURNAL OF ECONOMICS-REVUE CANADIENNE D ECONOMIQUE, 2015, 48 (02): : 389 - 407
  • [42] Variable selection for high-dimensional incomplete data
    Liang, Lixing
    Zhuang, Yipeng
    Yu, Philip L. H.
    [J]. COMPUTATIONAL STATISTICS & DATA ANALYSIS, 2024, 192
  • [43] High-dimensional graphs and variable selection with the Lasso
    Meinshausen, Nicolai
    Buehlmann, Peter
    [J]. ANNALS OF STATISTICS, 2006, 34 (03): : 1436 - 1462
  • [44] High-Dimensional Variable Selection for Survival Data
    Ishwaran, Hemant
    Kogalur, Udaya B.
    Gorodeski, Eiran Z.
    Minn, Andy J.
    Lauer, Michael S.
    [J]. JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION, 2010, 105 (489) : 205 - 217
  • [45] A GA-based Feature Selection for High-dimensional Data Clustering
    Sun, Mei
    Xiong, Langhuan
    Sun, Haojun
    Jiang, Dazhi
    [J]. THIRD INTERNATIONAL CONFERENCE ON GENETIC AND EVOLUTIONARY COMPUTING, 2009, : 769 - 772
  • [46] Model-based approach for high-dimensional non-Gaussian visual data clustering and feature weighting
    Elguebaly, Tarek
    Bouguila, Nizar
    [J]. DIGITAL SIGNAL PROCESSING, 2015, 40 : 63 - 79
  • [47] Model-based Deep Learning for High-Dimensional Periodic Structures
    Polo-Lopez, Lucas
    Le Magoarou, Luc
    Contreres, Romain
    Garcia-Vigueras, Maria
    [J]. 2024 18TH EUROPEAN CONFERENCE ON ANTENNAS AND PROPAGATION, EUCAP, 2024,
  • [48] Clustering high-dimensional data via feature selection
    Liu, Tianqi
    Lu, Yu
    Zhu, Biqing
    Zhao, Hongyu
    [J]. BIOMETRICS, 2023, 79 (02) : 940 - 950
  • [49] clustvarsel: A Package Implementing Variable Selection for Gaussian Model-Based Clustering in R
    Scrucca, Luca
    Raftery, Adrian E.
    [J]. JOURNAL OF STATISTICAL SOFTWARE, 2018, 84 (01): : 1 - 28
  • [50] Variable selection in penalized model-based clustering via regularization on grouped parameters
    Xie, Benhuai
    Pan, Wei
    Shen, Xiaotong
    [J]. BIOMETRICS, 2008, 64 (03) : 921 - 930