Bayesian variable selection in clustering high-dimensional data with substructure

被引:0
|
作者
Michael D. Swartz
Qianxing Mo
Mary E. Murphy
Joanne R. Lupton
Nancy D. Turner
Mee Young Hong
Marina Vannucci
机构
[1] M.D. Anderson Cancer Center,Department of Epidemiology
[2] Memorial Sloan-Kettering Cancer Center,Department of Epidemiology and Biostatistics
[3] Texas A&M University,Nutrition and Food Science Department
[4] Texas A&M University,School of Exercise and Nutritional Sciences
[5] San Diego State University,Department of Statistics
[6] Rice University,undefined
关键词
Bayesian inference; Designed experiments; Microarray analysis;
D O I
暂无
中图分类号
学科分类号
摘要
In this article we focus on clustering techniques recently proposed for high-dimensional data that incorporate variable selection and extend them to the modeling of data with a known substructure, such as the structure imposed by an experimental design. Our method essentially approximates the within-group covariance by facilitating clustering without disrupting the groups defined by the experimenter. The method we adopt simultaneously determines which expression patterns are important, and which genes contribute to such patterns. We evaluate performance on simulated data and on microarray data from a colon carcinogenesis study. Selected genes are biologically consistent with current research and provide strong biological validation of the cluster configuration identified by the method.
引用
收藏
页码:407 / 423
页数:16
相关论文
共 50 条
  • [21] Variable selection for model-based high-dimensional clustering
    Wang, Sijian
    Zhu, Ji
    PREDICTION AND DISCOVERY, 2007, 443 : 177 - +
  • [22] High-dimensional variable selection with the plaid mixture model for clustering
    Thierry Chekouo
    Alejandro Murua
    Computational Statistics, 2018, 33 : 1475 - 1496
  • [23] Clustering high-dimensional data via feature selection
    Liu, Tianqi
    Lu, Yu
    Zhu, Biqing
    Zhao, Hongyu
    BIOMETRICS, 2023, 79 (02) : 940 - 950
  • [24] MixDir: Scalable Bayesian Clustering for High-Dimensional Categorical Data
    Ahlmann-Eltze, Constantin
    Yau, Christopher
    2018 IEEE 5TH INTERNATIONAL CONFERENCE ON DATA SCIENCE AND ADVANCED ANALYTICS (DSAA), 2018, : 526 - 539
  • [25] VARIABLE SELECTION AND PREDICTION WITH INCOMPLETE HIGH-DIMENSIONAL DATA
    Liu, Ying
    Wang, Yuanjia
    Feng, Yang
    Wall, Melanie M.
    ANNALS OF APPLIED STATISTICS, 2016, 10 (01): : 418 - 450
  • [26] A Variable Selection Method for High-Dimensional Survival Data
    Giordano, Francesco
    Milito, Sara
    Restaino, Marialuisa
    MATHEMATICAL AND STATISTICAL METHODS FOR ACTUARIAL SCIENCES AND FINANCE, MAF 2022, 2022, : 303 - 308
  • [27] HIGH-DIMENSIONAL VARIABLE SELECTION
    Wasserman, Larry
    Roeder, Kathryn
    ANNALS OF STATISTICS, 2009, 37 (5A): : 2178 - 2201
  • [28] Model-based clustering of high-dimensional data: Variable selection versus facet determination
    Poon, Leonard K. M.
    Zhang, Nevin L.
    Liu, Tengfei
    Liu, April H.
    INTERNATIONAL JOURNAL OF APPROXIMATE REASONING, 2013, 54 (01) : 196 - 215
  • [29] Variable selection for model-based high-dimensional clustering and its application to microarray data
    Wang, Sijian
    Zhu, Ji
    BIOMETRICS, 2008, 64 (02) : 440 - 448
  • [30] A Metropolized Adaptive Subspace Algorithm for High-Dimensional Bayesian Variable Selection
    Staerk, Christian
    Kateri, Maria
    Ntzoufras, Ioannis
    BAYESIAN ANALYSIS, 2024, 19 (01): : 261 - 291