A comparative study of forest methods for time-to-event data: variable selection and predictive performance

被引:4
|
作者
Liu, Yingxin [1 ]
Zhou, Shiyu [1 ]
Wei, Hongxia [1 ]
An, Shengli [1 ]
机构
[1] Southern Med Univ, Guangdong Prov Key Lab Trop Dis Res, Sch Publ Hlth, Dept Biostat, Guangzhou, Guangdong, Peoples R China
关键词
Survival analysis; Random survival Forest; Conditional inference Forest; Maximally selected rank statistics; Machine learning; Variable selection; Brier score;
D O I
10.1186/s12874-021-01386-8
中图分类号
R19 [保健组织与事业(卫生事业管理)];
学科分类号
摘要
Background As a hot method in machine learning field, the forests approach is an attractive alternative approach to Cox model. Random survival forests (RSF) methodology is the most popular survival forests method, whereas its drawbacks exist such as a selection bias towards covariates with many possible split points. Conditional inference forests (CIF) methodology is known to reduce the selection bias via a two-step split procedure implementing hypothesis tests as it separates the variable selection and splitting, but its computation costs too much time. Random forests with maximally selected rank statistics (MSR-RF) methodology proposed recently seems to be a great improvement on RSF and CIF. Methods In this paper we used simulation study and real data application to compare prediction performances and variable selection performances among three survival forests methods, including RSF, CIF and MSR-RF. To evaluate the performance of variable selection, we combined all simulations to calculate the frequency of ranking top of the variable importance measures of the correct variables, where higher frequency means better selection ability. We used Integrated Brier Score (IBS) and c-index to measure the prediction accuracy of all three methods. The smaller IBS value, the greater the prediction. Results Simulations show that three forests methods differ slightly in prediction performance. MSR-RF and RSF might perform better than CIF when there are only continuous or binary variables in the datasets. For variable selection performance, When there are multiple categorical variables in the datasets, the selection frequency of RSF seems to be lowest in most cases. MSR-RF and CIF have higher selection rates, and CIF perform well especially with the interaction term. The fact that correlation degree of the variables has little effect on the selection frequency indicates that three forest methods can handle data with correlation. When there are only continuous variables in the datasets, MSR-RF perform better. When there are only binary variables in the datasets, RSF and MSR-RF have more advantages than CIF. When the variable dimension increases, MSR-RF and RSF seem to be more robustthan CIF Conclusions All three methods show advantages in prediction performances and variable selection performances under different situations. The recent proposed methodology MSR-RF possess practical value and is well worth popularizing. It is important to identify the appropriate method in real use according to the research aim and the nature of covariates.
引用
收藏
页数:16
相关论文
共 50 条
  • [21] Methods to Analyse Time-to-Event Data: The Kaplan-Meier Survival Curve
    D'Arrigo, Graziella
    Leonardis, Daniela
    Abd ElHafeez, Samar
    Fusaro, Maria
    Tripepi, Giovanni
    Roumeliotis, Stefanos
    [J]. OXIDATIVE MEDICINE AND CELLULAR LONGEVITY, 2021, 2021
  • [22] Practical methods for incorporating summary time-to-event data into meta-analysis
    Jayne F Tierney
    Lesley A Stewart
    Davina Ghersi
    Sarah Burdett
    Matthew R Sydes
    [J]. Trials, 8
  • [23] Practical methods for incorporating summary time-to-event data into meta-analysis
    Tierney, Jayne F.
    Stewart, Lesley A.
    Ghersi, Davina
    Burdett, Sarah
    Sydes, Matthew R.
    [J]. TRIALS, 2007, 8 (1)
  • [24] Instrumental variable method for time-to-event data using a pseudo-observation approach
    Kjaersgaard, Maiken I. S.
    Parner, Erik T.
    [J]. BIOMETRICS, 2016, 72 (02) : 463 - 472
  • [25] Survival analysis—time-to-event data and censoring
    Tanujit Dey
    Stuart R. Lipsitz
    Zara Cooper
    Quoc-Dien Trinh
    Martin Krzywinski
    Naomi Altman
    [J]. Nature Methods, 2022, 19 : 906 - 908
  • [26] Time-To-Event Data: An Overview and Analysis Considerations
    Le-Rademacher, Jennifer
    Wang, Xiaofei
    [J]. JOURNAL OF THORACIC ONCOLOGY, 2021, 16 (07) : 1067 - 1074
  • [27] Time-to-event analysis with treatment arm selection at interim
    Di Scala, L.
    Glimm, E.
    [J]. STATISTICS IN MEDICINE, 2011, 30 (26) : 3067 - 3081
  • [28] Approximation of Bayesian models for time-to-event data
    Catalano, Marta
    Lijoi, Antonio
    Prunster, Igor
    [J]. ELECTRONIC JOURNAL OF STATISTICS, 2020, 14 (02): : 3366 - 3395
  • [29] Germination Data Analysis by Time-to-Event Approaches
    Romano, Alessandro
    Stevanato, Piergiorgio
    [J]. PLANTS-BASEL, 2020, 9 (05):
  • [30] Differentiable sorting for censored time-to-event data
    Vauvelle, Andre
    Wild, Benjamin
    Eils, Roland
    Denaxas, Spiros
    [J]. ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 36 (NEURIPS 2023), 2023,