A comparative study of forest methods for time-to-event data: variable selection and predictive performance

被引：4

作者：

Liu, Yingxin ^{[1
]}

Zhou, Shiyu ^{[1
]}

Wei, Hongxia ^{[1
]}

An, Shengli ^{[1
]}

机构：

[1] Southern Med Univ, Guangdong Prov Key Lab Trop Dis Res, Sch Publ Hlth, Dept Biostat, Guangzhou, Guangdong, Peoples R China

来源：

BMC MEDICAL RESEARCH METHODOLOGY | 2021年 / 21卷 / 01期

关键词：

Survival analysis; Random survival Forest; Conditional inference Forest; Maximally selected rank statistics; Machine learning; Variable selection; Brier score;

D O I：

10.1186/s12874-021-01386-8

中图分类号：

R19 [保健组织与事业（卫生事业管理）];

学科分类号：

摘要：

Background As a hot method in machine learning field, the forests approach is an attractive alternative approach to Cox model. Random survival forests (RSF) methodology is the most popular survival forests method, whereas its drawbacks exist such as a selection bias towards covariates with many possible split points. Conditional inference forests (CIF) methodology is known to reduce the selection bias via a two-step split procedure implementing hypothesis tests as it separates the variable selection and splitting, but its computation costs too much time. Random forests with maximally selected rank statistics (MSR-RF) methodology proposed recently seems to be a great improvement on RSF and CIF. Methods In this paper we used simulation study and real data application to compare prediction performances and variable selection performances among three survival forests methods, including RSF, CIF and MSR-RF. To evaluate the performance of variable selection, we combined all simulations to calculate the frequency of ranking top of the variable importance measures of the correct variables, where higher frequency means better selection ability. We used Integrated Brier Score (IBS) and c-index to measure the prediction accuracy of all three methods. The smaller IBS value, the greater the prediction. Results Simulations show that three forests methods differ slightly in prediction performance. MSR-RF and RSF might perform better than CIF when there are only continuous or binary variables in the datasets. For variable selection performance, When there are multiple categorical variables in the datasets, the selection frequency of RSF seems to be lowest in most cases. MSR-RF and CIF have higher selection rates, and CIF perform well especially with the interaction term. The fact that correlation degree of the variables has little effect on the selection frequency indicates that three forest methods can handle data with correlation. When there are only continuous variables in the datasets, MSR-RF perform better. When there are only binary variables in the datasets, RSF and MSR-RF have more advantages than CIF. When the variable dimension increases, MSR-RF and RSF seem to be more robustthan CIF Conclusions All three methods show advantages in prediction performances and variable selection performances under different situations. The recent proposed methodology MSR-RF possess practical value and is well worth popularizing. It is important to identify the appropriate method in real use according to the research aim and the nature of covariates.

引用

页数：16

共 50 条

[21] Methods to Analyse Time-to-Event Data: The Kaplan-Meier Survival Curve
D'Arrigo, Graziella
Leonardis, Daniela
Abd ElHafeez, Samar
Fusaro, Maria
Tripepi, Giovanni
Roumeliotis, Stefanos
[J]. OXIDATIVE MEDICINE AND CELLULAR LONGEVITY, 2021, 2021
[22] Practical methods for incorporating summary time-to-event data into meta-analysis
Jayne F Tierney
Lesley A Stewart
Davina Ghersi
Sarah Burdett
Matthew R Sydes
[J]. Trials, 8
[23] Practical methods for incorporating summary time-to-event data into meta-analysis
Tierney, Jayne F.
Stewart, Lesley A.
Ghersi, Davina
Burdett, Sarah
Sydes, Matthew R.
[J]. TRIALS, 2007, 8 (1)
[24] Instrumental variable method for time-to-event data using a pseudo-observation approach
Kjaersgaard, Maiken I. S.
Parner, Erik T.
[J]. BIOMETRICS, 2016, 72 (02) : 463 - 472
[25] Survival analysis—time-to-event data and censoring
Tanujit Dey
Stuart R. Lipsitz
Zara Cooper
Quoc-Dien Trinh
Martin Krzywinski
Naomi Altman
[J]. Nature Methods, 2022, 19 : 906 - 908
[26] Time-To-Event Data: An Overview and Analysis Considerations
Le-Rademacher, Jennifer
Wang, Xiaofei
[J]. JOURNAL OF THORACIC ONCOLOGY, 2021, 16 (07) : 1067 - 1074
[27] Time-to-event analysis with treatment arm selection at interim
Di Scala, L.
Glimm, E.
[J]. STATISTICS IN MEDICINE, 2011, 30 (26) : 3067 - 3081
[28] Approximation of Bayesian models for time-to-event data
Catalano, Marta
Lijoi, Antonio
Prunster, Igor
[J]. ELECTRONIC JOURNAL OF STATISTICS, 2020, 14 (02): : 3366 - 3395
[29] Germination Data Analysis by Time-to-Event Approaches
Romano, Alessandro
Stevanato, Piergiorgio
[J]. PLANTS-BASEL, 2020, 9 (05):
[30] Differentiable sorting for censored time-to-event data
Vauvelle, Andre
Wild, Benjamin
Eils, Roland
Denaxas, Spiros
[J]. ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 36 (NEURIPS 2023), 2023,

← 1 2 3 4 5 →