Penalized Cox regression analysis in the high-dimensional and low-sample size settings, with applications to microarray gene expression data

被引:392
|
作者
Gui, J
Li, HZ [1 ]
机构
[1] Univ Calif Davis, Rowe Program Human Genet, Davis, CA 95616 USA
[2] Univ Calif Davis, Dept Stat, Davis, CA 95616 USA
关键词
D O I
10.1093/bioinformatics/bti422
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Motivation: An important application of microarray technology is to relate gene expression profiles to various clinical phenotypes of patients. Success has been demonstrated in molecular classification of cancer in which the gene expression data serve as predictors and different types of cancer serve as a categorical outcome variable. However, there has been less research in linking gene expression profiles to the censored survival data such as patients' overall survival time or time to cancer relapse. It would be desirable to have models with good prediction accuracy and parsimony property. Results: We propose to use the L-1 penalized estimation for the Cox model to select genes that are relevant to patients' survival and to build a predictive model for future prediction. The computational difficulty associated with the estimation in the high-dimensional and low-sample size settings can be efficiently solved by using the recently developed least-angle regression (LARS) method. Our simulation studies and application to real datasets on predicting survival after chemotherapy for patients with diffuse large B-cell lymphoma demonstrate that the proposed procedure, which we call the LARS-Cox procedure, can be used for identifying important genes that are related to time to death due to cancer and for building a parsimonious model for predicting the survival of future patients. The LARS-Cox regression gives better predictive performance than the L-2 penalized regression and a few other dimension-reduction based methods. Conclusions: We conclude that the proposed LARS-Cox procedure can be very useful in identifying genes relevant to survival phenotypes and in building a parsimonious predictive model that can be used for classifying future patients into clinically relevant high- and low-risk groups based on the gene expression profile and survival times of previous patients.
引用
收藏
页码:3001 / 3008
页数:8
相关论文
共 50 条
  • [1] Partial Cox regression analysis for high-dimensional microarray gene expression data
    Li, Hongzhe
    Gui, Jiang
    [J]. BIOINFORMATICS, 2004, 20 : 208 - 215
  • [2] Significance analysis of high-dimensional, low-sample size partially labeled data
    Lu, Qiyi
    Qiao, Xingye
    [J]. JOURNAL OF STATISTICAL PLANNING AND INFERENCE, 2016, 176 : 78 - 94
  • [3] Separability tests for high-dimensional, low-sample size multivariate repeated measures data
    Simpson, Sean L.
    Edwards, Lloyd J.
    Styner, Martin A.
    Muller, Keith E.
    [J]. JOURNAL OF APPLIED STATISTICS, 2014, 41 (11) : 2450 - 2461
  • [4] Benefits of dimension reduction in penalized regression methods for high-dimensional grouped data: a case study in low sample size
    Ajana, Soufiane
    Acar, Niyazi
    Bretillon, Lionel
    Hejblum, Boris P.
    Jacqmin-Gadda, Helene
    Delcourt, Cecile
    Berdeaux, Olivier
    Bouton, Sylvain
    Bron, Alain
    Buaud, Benjamin
    Cabaret, Stephanie
    Cougnard-Gregorie, Audrey
    Creuzot-Garcher, Catherine
    Delyfer, Marie-Noelle
    Feart-Couret, Catherine
    Febvret, Valerie
    Gregoire, Stephane
    He, Zhiguo
    Korobelnik, Jean-Francois
    Martine, Lucy
    Merle, Benedicte
    Vaysse, Carole
    [J]. BIOINFORMATICS, 2019, 35 (19) : 3628 - 3634
  • [5] Graph convolutional network-based feature selection for high-dimensional and low-sample size data
    Chen, Can
    Weiss, Scott T.
    Liu, Yang-Yu
    [J]. BIOINFORMATICS, 2023, 39 (04)
  • [6] Vanishing deviance problem in high-dimensional penalized Cox regression
    Yao, Sijie
    Li, Tingyi
    Cao, Biwei
    Wang, Xuefeng
    [J]. CANCER RESEARCH, 2023, 83 (07)
  • [7] High-dimensional, massive sample-size Cox proportional hazards regression for survival analysis
    Mittal, Sushil
    Madigan, David
    Burd, Randall S.
    Suchard, Marc A.
    [J]. BIOSTATISTICS, 2014, 15 (02) : 207 - 221
  • [8] Neuromorphic tuning of feature spaces to overcome the challenge of low-sample high-dimensional data
    Zhou, Qinghua
    Sutton, Oliver J.
    Zhang, Yu-Dong
    Gorban, Alexander N.
    Makarov, Valeri A.
    Tyukin, Ivan Y.
    [J]. 2023 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS, IJCNN, 2023,
  • [9] Scale adjustments for classifiers in high-dimensional, low sample size settings
    Chan, Yao-Ban
    Hall, Peter
    [J]. BIOMETRIKA, 2009, 96 (02) : 469 - 478
  • [10] Bootstrapping in a high dimensional but very low-sample size problem
    Song, Juhee
    Hart, Jeffrey D.
    [J]. JOURNAL OF STATISTICAL COMPUTATION AND SIMULATION, 2010, 80 (08) : 825 - 840