Variable selection for high-dimensional genomic data with censored outcomes using group lasso prior

被引:7
|
作者
Lee, Kyu Ha [1 ,2 ]
Chakraborty, Sounak [3 ]
Sun, Jianguo [3 ]
机构
[1] Forsyth Inst, Epidemiol & Biostat Core, Cambridge, MA USA
[2] Harvard Sch Dent Med, Dept Oral Hlth Policy & Epidemiol, Boston, MA USA
[3] Univ Missouri, Dept Stat, Columbia, MO 65211 USA
基金
美国国家科学基金会;
关键词
Accelerated failure time model; Bayesian lasso; Gibbs sampler; Group lasso; Penalized regression; FAILURE TIME MODEL; MICROARRAY DATA; SURVIVAL ANALYSIS; HAZARD RATIOS; ELASTIC NET; COX MODEL; REGRESSION; PREDICTION; SHRINKAGE;
D O I
10.1016/j.csda.2017.02.014
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
The variable selection problem is discussed in the context of high-dimensional failure time data arising from the accelerated failure time model. A data augmentation approach is employed in order to deal with censored survival times and to facilitate prior-posterior conjugacy. To identify a set of grouped relevant covariates, a shrinkage prior distribution is specified for regression coefficients mimicking the effect of group lasso penalty. It is noted that unlike the corresponding frequentist method, a Bayesian penalized regression approach cannot shrink the estimates of coefficients to exact zeros in general. Towards resolving the issue, a two-stage thresholding method that exploits the scaled neighbor-hood criterion and the Bayesian information criterion is devised. Simulation studies are performed to assess the robustness and performance of the proposed method in terms of variable selection accuracy and predictive power. The method is successfully applied to a set of microarray data on the individuals diagnosed with diffuse large B-cell lymphoma. In addition, an R package called psbcGroup, which can be downloaded freely from CRAN, is developed for the implementation of the methods. (C) 2017 Elsevier B.V. All rights reserved.
引用
收藏
页码:1 / 13
页数:13
相关论文
共 50 条
  • [21] VARIABLE SELECTION AND PREDICTION WITH INCOMPLETE HIGH-DIMENSIONAL DATA
    Liu, Ying
    Wang, Yuanjia
    Feng, Yang
    Wall, Melanie M.
    ANNALS OF APPLIED STATISTICS, 2016, 10 (01): : 418 - 450
  • [22] Bayesian variable selection for high-dimensional rank data
    Cui, Can
    Singh, Susheela P.
    Staicu, Ana-Maria
    Reich, Brian J.
    ENVIRONMETRICS, 2021, 32 (07)
  • [23] A Variable Selection Method for High-Dimensional Survival Data
    Giordano, Francesco
    Milito, Sara
    Restaino, Marialuisa
    MATHEMATICAL AND STATISTICAL METHODS FOR ACTUARIAL SCIENCES AND FINANCE, MAF 2022, 2022, : 303 - 308
  • [24] Precision Lasso: accounting for correlations and linear dependencies in high-dimensional genomic data
    Wang, Haohan
    Lengerich, Benjamin J.
    Aragam, Bryon
    Xing, Eric P.
    BIOINFORMATICS, 2019, 35 (07) : 1181 - 1187
  • [25] Feature selection for high-dimensional neural network potentials with the adaptive group lasso
    Sandberg, Johannes
    Voigtmann, Thomas
    Devijver, Emilie
    Jakse, Noel
    MACHINE LEARNING-SCIENCE AND TECHNOLOGY, 2024, 5 (02):
  • [26] HIGH-DIMENSIONAL VARIABLE SELECTION
    Wasserman, Larry
    Roeder, Kathryn
    ANNALS OF STATISTICS, 2009, 37 (5A): : 2178 - 2201
  • [27] A penalized variable selection ensemble algorithm for high-dimensional group-structured data
    Li, Dongsheng
    Pan, Chunyan
    Zhao, Jing
    Luo, Anfei
    PLOS ONE, 2024, 19 (02):
  • [28] Variable selection for high-dimensional incomplete data using horseshoe estimation with data augmentation
    Zhang, Yunxi
    Kim, Soeun
    COMMUNICATIONS IN STATISTICS-THEORY AND METHODS, 2024, 53 (12) : 4235 - 4251
  • [29] High-dimensional variable selection for ordinal outcomes with error control
    Fu, Han
    Archer, Kellie J.
    BRIEFINGS IN BIOINFORMATICS, 2021, 22 (01) : 334 - 345
  • [30] Comparison of biomarker selection methods in high-dimensional genomic data
    Wang, Y.
    Guo, S.
    EUROPEAN JOURNAL OF CANCER, 2022, 174 : S98 - S98