Variable selection for high-dimensional genomic data with censored outcomes using group lasso prior

被引:7
|
作者
Lee, Kyu Ha [1 ,2 ]
Chakraborty, Sounak [3 ]
Sun, Jianguo [3 ]
机构
[1] Forsyth Inst, Epidemiol & Biostat Core, Cambridge, MA USA
[2] Harvard Sch Dent Med, Dept Oral Hlth Policy & Epidemiol, Boston, MA USA
[3] Univ Missouri, Dept Stat, Columbia, MO 65211 USA
基金
美国国家科学基金会;
关键词
Accelerated failure time model; Bayesian lasso; Gibbs sampler; Group lasso; Penalized regression; FAILURE TIME MODEL; MICROARRAY DATA; SURVIVAL ANALYSIS; HAZARD RATIOS; ELASTIC NET; COX MODEL; REGRESSION; PREDICTION; SHRINKAGE;
D O I
10.1016/j.csda.2017.02.014
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
The variable selection problem is discussed in the context of high-dimensional failure time data arising from the accelerated failure time model. A data augmentation approach is employed in order to deal with censored survival times and to facilitate prior-posterior conjugacy. To identify a set of grouped relevant covariates, a shrinkage prior distribution is specified for regression coefficients mimicking the effect of group lasso penalty. It is noted that unlike the corresponding frequentist method, a Bayesian penalized regression approach cannot shrink the estimates of coefficients to exact zeros in general. Towards resolving the issue, a two-stage thresholding method that exploits the scaled neighbor-hood criterion and the Bayesian information criterion is devised. Simulation studies are performed to assess the robustness and performance of the proposed method in terms of variable selection accuracy and predictive power. The method is successfully applied to a set of microarray data on the individuals diagnosed with diffuse large B-cell lymphoma. In addition, an R package called psbcGroup, which can be downloaded freely from CRAN, is developed for the implementation of the methods. (C) 2017 Elsevier B.V. All rights reserved.
引用
收藏
页码:1 / 13
页数:13
相关论文
共 50 条
  • [1] Variable selection and subgroup analysis for high-dimensional censored data
    Zhang, Yu
    Wang, Jiangli
    Zhang, Weiping
    STATISTICAL THEORY AND RELATED FIELDS, 2024, 8 (03) : 211 - 231
  • [2] High-dimensional graphs and variable selection with the Lasso
    Meinshausen, Nicolai
    Buehlmann, Peter
    ANNALS OF STATISTICS, 2006, 34 (03): : 1436 - 1462
  • [3] LASSO-type variable selection methods for high-dimensional data
    Fu, Guanghui
    Wang, Pan
    ADVANCES IN COMPUTATIONAL MODELING AND SIMULATION, PTS 1 AND 2, 2014, 444-445 : 604 - 609
  • [4] High-Dimensional Variable Selection in Meta-Analysis for Censored Data
    Liu, Fei
    Dunson, David
    Zou, Fei
    BIOMETRICS, 2011, 67 (02) : 504 - 512
  • [5] Stable Variable Selection for High-Dimensional Genomic Data with Strong Correlations
    Sarkar R.
    Manage S.
    Gao X.
    Annals of Data Science, 2024, 11 (04) : 1139 - 1164
  • [6] HIGH-DIMENSIONAL VARIABLE SELECTION WITH RIGHT-CENSORED LENGTH-BIASED DATA
    Di He
    Zhou, Yong
    Zou, Hui
    STATISTICA SINICA, 2020, 30 (01) : 193 - 215
  • [7] A variable selection approach for highly correlated predictors in high-dimensional genomic data
    Zhu, Wencan
    Levy-Leduc, Celine
    Ternes, Nils
    BIOINFORMATICS, 2021, 37 (16) : 2238 - 2244
  • [8] Variable selection for high-dimensional incomplete data
    Liang, Lixing
    Zhuang, Yipeng
    Yu, Philip L. H.
    COMPUTATIONAL STATISTICS & DATA ANALYSIS, 2024, 192
  • [9] High-Dimensional Variable Selection for Survival Data
    Ishwaran, Hemant
    Kogalur, Udaya B.
    Gorodeski, Eiran Z.
    Minn, Andy J.
    Lauer, Michael S.
    JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION, 2010, 105 (489) : 205 - 217
  • [10] The joint lasso: high-dimensional regression for group structured data
    Dondelinger, Frank
    Mukherjee, Sach
    BIOSTATISTICS, 2020, 21 (02) : 219 - 235