High-dimensional variable selection with the plaid mixture model for clustering

被引:0
|
作者
Thierry Chekouo
Alejandro Murua
机构
[1] University of Minnesota Duluth,Department of Mathematics and Statistics
[2] Université de Montréal,Département de mathématiques et de statistique
来源
Computational Statistics | 2018年 / 33卷
关键词
Classification; Model selection; Multiplicative mixture model; Monte Carlo EM; Kidney cancer genomic data;
D O I
暂无
中图分类号
学科分类号
摘要
With high-dimensional data, the number of covariates is considerably larger than the sample size. We propose a sound method for analyzing these data. It performs simultaneously clustering and variable selection. The method is inspired by the plaid model. It may be seen as a multiplicative mixture model that allows for overlapping clustering. Unlike conventional clustering, within this model an observation may be explained by several clusters. This characteristic makes it specially suitable for gene expression data. Parameter estimation is performed with the Monte Carlo expectation maximization algorithm and importance sampling. Using extensive simulations and comparisons with competing methods, we show the advantages of our methodology, in terms of both variable selection and clustering. An application of our approach to the gene expression data of kidney renal cell carcinoma taken from The Cancer Genome Atlas validates some previously identified cancer biomarkers.
引用
收藏
页码:1475 / 1496
页数:21
相关论文
共 50 条
  • [1] High-dimensional variable selection with the plaid mixture model for clustering
    Chekouo, Thierry
    Murua, Alejandro
    [J]. COMPUTATIONAL STATISTICS, 2018, 33 (03) : 1475 - 1496
  • [2] Variable selection for model-based high-dimensional clustering
    Wang, Sijian
    Zhu, Ji
    [J]. PREDICTION AND DISCOVERY, 2007, 443 : 177 - +
  • [3] Pairwise Variable Selection for High-Dimensional Model-Based Clustering
    Guo, Jian
    Levina, Elizaveta
    Michailidis, George
    Zhu, Ji
    [J]. BIOMETRICS, 2010, 66 (03) : 793 - 804
  • [4] Bayesian variable selection in clustering high-dimensional data via a mixture of finite mixtures
    Doo, Woojin
    Kim, Heeyoung
    [J]. JOURNAL OF STATISTICAL COMPUTATION AND SIMULATION, 2021, 91 (12) : 2551 - 2568
  • [5] Bayesian variable selection in clustering high-dimensional data
    Tadesse, MG
    Sha, N
    Vannucci, M
    [J]. JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION, 2005, 100 (470) : 602 - 617
  • [6] Regularized Gaussian Mixture Model for High-Dimensional Clustering
    Zhao, Yang
    Shrivastava, Abhishek K.
    Tsui, Kwok Leung
    [J]. IEEE TRANSACTIONS ON CYBERNETICS, 2019, 49 (10) : 3677 - 3688
  • [7] Bayesian Variable Selection in Clustering High-Dimensional Data With Substructure
    Swartz, Michael D.
    Mo, Qianxing
    Murphy, Mary E.
    Lupton, Joanne R.
    Turner, Nancy D.
    Hong, Mee Young
    Vannucci, Marina
    [J]. JOURNAL OF AGRICULTURAL BIOLOGICAL AND ENVIRONMENTAL STATISTICS, 2008, 13 (04) : 407 - 423
  • [8] Bayesian variable selection in clustering high-dimensional data with substructure
    Michael D. Swartz
    Qianxing Mo
    Mary E. Murphy
    Joanne R. Lupton
    Nancy D. Turner
    Mee Young Hong
    Marina Vannucci
    [J]. Journal of Agricultural, Biological, and Environmental Statistics, 2008, 13 : 407 - 423
  • [9] HIGH-DIMENSIONAL VARIABLE SELECTION
    Wasserman, Larry
    Roeder, Kathryn
    [J]. ANNALS OF STATISTICS, 2009, 37 (5A): : 2178 - 2201
  • [10] A finite mixture model for simultaneous high-dimensional clustering, localized feature selection and outlier rejection
    Bouguila, Nizar
    Almakadmeh, Khaled
    Boutemedjet, Sabri
    [J]. EXPERT SYSTEMS WITH APPLICATIONS, 2012, 39 (07) : 6641 - 6656