A comparative study of forest methods for time-to-event data: variable selection and predictive performance

被引:4
|
作者
Liu, Yingxin [1 ]
Zhou, Shiyu [1 ]
Wei, Hongxia [1 ]
An, Shengli [1 ]
机构
[1] Southern Med Univ, Guangdong Prov Key Lab Trop Dis Res, Sch Publ Hlth, Dept Biostat, Guangzhou, Guangdong, Peoples R China
关键词
Survival analysis; Random survival Forest; Conditional inference Forest; Maximally selected rank statistics; Machine learning; Variable selection; Brier score;
D O I
10.1186/s12874-021-01386-8
中图分类号
R19 [保健组织与事业(卫生事业管理)];
学科分类号
摘要
Background As a hot method in machine learning field, the forests approach is an attractive alternative approach to Cox model. Random survival forests (RSF) methodology is the most popular survival forests method, whereas its drawbacks exist such as a selection bias towards covariates with many possible split points. Conditional inference forests (CIF) methodology is known to reduce the selection bias via a two-step split procedure implementing hypothesis tests as it separates the variable selection and splitting, but its computation costs too much time. Random forests with maximally selected rank statistics (MSR-RF) methodology proposed recently seems to be a great improvement on RSF and CIF. Methods In this paper we used simulation study and real data application to compare prediction performances and variable selection performances among three survival forests methods, including RSF, CIF and MSR-RF. To evaluate the performance of variable selection, we combined all simulations to calculate the frequency of ranking top of the variable importance measures of the correct variables, where higher frequency means better selection ability. We used Integrated Brier Score (IBS) and c-index to measure the prediction accuracy of all three methods. The smaller IBS value, the greater the prediction. Results Simulations show that three forests methods differ slightly in prediction performance. MSR-RF and RSF might perform better than CIF when there are only continuous or binary variables in the datasets. For variable selection performance, When there are multiple categorical variables in the datasets, the selection frequency of RSF seems to be lowest in most cases. MSR-RF and CIF have higher selection rates, and CIF perform well especially with the interaction term. The fact that correlation degree of the variables has little effect on the selection frequency indicates that three forest methods can handle data with correlation. When there are only continuous variables in the datasets, MSR-RF perform better. When there are only binary variables in the datasets, RSF and MSR-RF have more advantages than CIF. When the variable dimension increases, MSR-RF and RSF seem to be more robustthan CIF Conclusions All three methods show advantages in prediction performances and variable selection performances under different situations. The recent proposed methodology MSR-RF possess practical value and is well worth popularizing. It is important to identify the appropriate method in real use according to the research aim and the nature of covariates.
引用
收藏
页数:16
相关论文
共 50 条
  • [1] A comparative study of forest methods for time-to-event data: variable selection and predictive performance
    Yingxin Liu
    Shiyu Zhou
    Hongxia Wei
    Shengli An
    [J]. BMC Medical Research Methodology, 21
  • [2] Comparison of Variable Selection Methods for Time-to-Event Data in High-Dimensional Settings
    Gilhodes, Julia
    Dalenc, Florence
    Gal, Jocelyn
    Zemmour, Christophe
    Leconte, Eve
    Boher, Jean-Marie
    Filleron, Thomas
    [J]. COMPUTATIONAL AND MATHEMATICAL METHODS IN MEDICINE, 2020, 2020
  • [3] Combined performance of screening and variable selection methods in ultra-high dimensional data in predicting time-to-event outcomes
    Lira Pi
    Susan Halabi
    [J]. Diagnostic and Prognostic Research, 2 (1)
  • [4] Aggregation methods and comparative study in time-to-event analysis models
    Fernandez, Camila
    Chen, Chung Shue
    Gaillard, Pierre
    Silva, Alonso
    [J]. INTERNATIONAL JOURNAL OF DATA SCIENCE AND ANALYTICS, 2024,
  • [5] Penalized variable selection in copula survival models for clustered time-to-event data
    Kwon, Sookhee
    Ha, Il Do
    Kim, Jong-Min
    [J]. JOURNAL OF STATISTICAL COMPUTATION AND SIMULATION, 2020, 90 (04) : 657 - 675
  • [6] A comparative study of machine learning methods for time-to-event survival data for radiomics risk modelling
    Stefan Leger
    Alex Zwanenburg
    Karoline Pilz
    Fabian Lohaus
    Annett Linge
    Klaus Zöphel
    Jörg Kotzerke
    Andreas Schreiber
    Inge Tinhofer
    Volker Budach
    Ali Sak
    Martin Stuschke
    Panagiotis Balermpas
    Claus Rödel
    Ute Ganswindt
    Claus Belka
    Steffi Pigorsch
    Stephanie E. Combs
    David Mönnich
    Daniel Zips
    Mechthild Krause
    Michael Baumann
    Esther G. C. Troost
    Steffen Löck
    Christian Richter
    [J]. Scientific Reports, 7
  • [7] A comparative study of machine learning methods for time-to-event survival data for radiomics risk modelling
    Leger, Stefan
    Zwanenburg, Alex
    Pilz, Karoline
    Lohaus, Fabian
    Linge, Annett
    Zoephel, Klaus
    Kotzerke, Joerg
    Schreiber, Andreas
    Tinhofer, Inge
    Budach, Volker
    Sak, Ali
    Stuschke, Martin
    Balermpas, Panagiotis
    Roedel, Claus
    Ganswindt, Ute
    Belka, Claus
    Pigorsch, Steffi
    Combs, Stephanie E.
    Moennich, David
    Zips, Daniel
    Krause, Mechthild
    Baumann, Michael
    Troost, Esther G. C.
    Loeck, Steffen
    Richter, Christian
    [J]. SCIENTIFIC REPORTS, 2017, 7
  • [8] Methods for Informative Censoring in Time-to-Event Data Analysis
    Jin, Man
    Fang, Yixin
    [J]. STATISTICS IN BIOPHARMACEUTICAL RESEARCH, 2024, 16 (01): : 47 - 54
  • [9] Performance of Three Estimation Methods in Repeated Time-to-Event Modeling
    Kristin E. Karlsson
    Elodie L. Plan
    Mats O. Karlsson
    [J]. The AAPS Journal, 2011, 13 : 83 - 91
  • [10] Methods to Analyze Time-to-Event Data: The Cox Regression Analysis
    Abd ElHafeez, Samar
    D'Arrigo, Graziella
    Leonardis, Daniela
    Fusaro, Maria
    Tripepi, Giovanni
    Roumeliotis, Stefanos
    [J]. OXIDATIVE MEDICINE AND CELLULAR LONGEVITY, 2021, 2021