Incorporating Grouping Information in Bayesian Variable Selection with Applications in Genomics

被引:29
|
作者
Rockova, Veronika [1 ]
Lesaffre, Emmanuel [1 ,2 ]
机构
[1] Erasmus Univ, Dept Biostat, Erasmus MC, NL-3000 DR Rotterdam, Netherlands
[2] Katholieke Univ Leuven, L BioStat, Louvain, Belgium
来源
BAYESIAN ANALYSIS | 2014年 / 9卷 / 01期
关键词
Bayesian shrinkage estimation; EM algorithm; Bayesian LASSO; Minorization-maximization; NONCONCAVE PENALIZED LIKELIHOOD; MODEL SELECTION; REGRESSION; NETWORK; REGULARIZATION; EXPRESSION; PRIORS; CELLS;
D O I
10.1214/13-BA846
中图分类号
O1 [数学];
学科分类号
0701 ; 070101 ;
摘要
In many applications it is of interest to determine a limited number of important explanatory factors (representing groups of potentially overlapping predictors) rather than original predictor variables. The often imposed requirement that the clustered predictors should enter the model simultaneously may be limiting as not all the variables within a group need to be associated with the outcome. Within-group sparsity is often desirable as well. Here we propose a Bayesian variable selection method, which uses the grouping information as a means of introducing more equal competition to enter the model within the groups rather than as a source of strict regularization constraints. This is achieved within the context of Bayesian LASSO (least absolute shrinkage and selection operator) by allowing each regression coefficient to be penalized differentially and by considering an additional regression layer to relate individual penalty parameters to a group identification matrix. The proposed hierarchical model therefore enables inference simultaneously on two levels: (1) the regression layer for the continuous outcome in relation to the predictors and (2) the regression layer for the penalty parameters in relation to the grouping information. Both situations with overlapping and non-overlapping groups are applicable. The method does not assume within-group homogeneity across the regression coefficients, which is implicit in many structured penalized likelihood approaches. The smoothness here is enforced at the penalty level rather than within the regression coefficients. To enhance the potential of the proposed method we develop two rapid computational procedures based on the expectation maximization (EM) algorithm, which offer substantial time savings in applications where the high-dimensionality renders Markov chain Monte Carlo (MCMC) approaches less practical. We demonstrate the usefulness of our method in predicting time to death in glioblastoma patients using pathways of genes.
引用
收藏
页码:221 / 257
页数:37
相关论文
共 50 条
  • [31] Performance of variable selection methods in regression using variations of the Bayesian information criterion
    Burr, Tom
    Fry, Herb
    McVey, Brian
    Sander, Eric
    Cavanaugh, Joseph
    Neath, Andrew
    [J]. COMMUNICATIONS IN STATISTICS-SIMULATION AND COMPUTATION, 2008, 37 (03) : 507 - 520
  • [32] Bayesian variable selection for parametric survival model with applications to cancer omics data
    Weiwei Duan
    Ruyang Zhang
    Yang Zhao
    Sipeng Shen
    Yongyue Wei
    Feng Chen
    David C. Christiani
    [J]. Human Genomics, 12
  • [33] BIVAS: A Scalable Bayesian Method for Bi-Level Variable Selection With Applications
    Cai, Mingxuan
    Dai, Mingwei
    Ming, Jingsi
    Peng, Heng
    Liu, Jin
    Yang, Can
    [J]. JOURNAL OF COMPUTATIONAL AND GRAPHICAL STATISTICS, 2020, 29 (01) : 40 - 52
  • [34] Joint Bayesian Variable Selection and Graph Estimation for Non-linear SVM with Application to Genomics Data
    Sun, Wenli
    Chang, Changgee
    Long, Qi
    [J]. 2020 IEEE 7TH INTERNATIONAL CONFERENCE ON DATA SCIENCE AND ADVANCED ANALYTICS (DSAA 2020), 2020, : 315 - 323
  • [35] Bayesian variable selection for parametric survival model with applications to cancer omics data
    Duan, Weiwei
    Zhang, Ruyang
    Zhao, Yang
    Shen, Sipeng
    Wei, Yongyue
    Chen, Feng
    Christiani, David C.
    [J]. HUMAN GENOMICS, 2018, 12
  • [36] MCMC methods for bayesian variable selection in large-scale genomic applications
    Zucknick, Manuela
    Holmes, Chris
    Richardson, Sylvia
    [J]. ANNALS OF HUMAN GENETICS, 2007, 71 : 558 - 559
  • [37] Learning Sparse Gaussian Bayesian Network Structure by Variable Grouping
    Yang, Jie
    Leung, Henry C. M.
    Yiu, S. M.
    Cai, Yunpeng
    Chin, Francis Y. L.
    [J]. 2014 IEEE INTERNATIONAL CONFERENCE ON DATA MINING (ICDM), 2014, : 1073 - 1078
  • [38] Variable selection with ABC Bayesian forests
    Liu, Yi
    Rockova, Veronika
    Wang, Yuexi
    [J]. JOURNAL OF THE ROYAL STATISTICAL SOCIETY SERIES B-STATISTICAL METHODOLOGY, 2021, 83 (03) : 453 - 481
  • [39] Bayesian variable selection in quantile regression
    Yu, Keming
    Chen, Cathy W. S.
    Reed, Craig
    Dunson, David B.
    [J]. STATISTICS AND ITS INTERFACE, 2013, 6 (02) : 261 - 274
  • [40] Bayesian variable selection for regression models
    Kuo, L
    Mallick, B
    [J]. AMERICAN STATISTICAL ASSOCIATION - 1996 PROCEEDINGS OF THE SECTION ON BAYESIAN STATISTICAL SCIENCE, 1996, : 170 - 175