Variable selection under multiple imputation using the bootstrap in a prognostic study

被引:129
|
作者
Heymans, Martijn W. [1 ]
van Buuren, Stef
Knol, Dirk L.
van Mechelen, Willem
de Vet, Henrica C. W.
机构
[1] Vrije Univ Amsterdam, Inst Hth Sci, Dept Methodol & Appl Biostat, Amsterdam, Netherlands
[2] Vrije Univ Amsterdam, Inst Hlth Sci, Dept Methodol & Appl Biostat, Amsterdam, Netherlands
[3] TNO VUmc, BodyWork Res Ctr Phys Act &Hlth, Amsterdam, Netherlands
[4] Vrije Univ Amsterdam Med Ctr, Dept Publ & Occupat Hlth, Amsterdam, Netherlands
[5] Vrije Univ Amsterdam Med Ctr, Inst Res Extramural Med, Amsterdam, Netherlands
[6] TNO Qual Life, Leiden, Netherlands
[7] Univ Utrecht, Dept Stat, NL-3508 TC Utrecht, Netherlands
[8] Vrije Univ Amsterdam Med Ctr, Amsterdam, Netherlands
[9] Vrije Univ Amsterdam Med Ctr, EMGO Inst, Amsterdam, Netherlands
关键词
D O I
10.1186/1471-2288-7-33
中图分类号
R19 [保健组织与事业(卫生事业管理)];
学科分类号
摘要
Background: Missing data is a challenging problem in many prognostic studies. Multiple imputation (MI) accounts for imputation uncertainty that allows for adequate statistical testing. We developed and tested a methodology combining MI with bootstrapping techniques for studying prognostic variable selection. Method: In our prospective cohort study we merged data from three different randomized controlled trials (RCTs) to assess prognostic variables for chronicity of low back pain. Among the outcome and prognostic variables data were missing in the range of 0 and 48.1%. We used four methods to investigate the influence of respectively sampling and imputation variation: MI only, bootstrap only, and two methods that combine MI and bootstrapping. Variables were selected based on the inclusion frequency of each prognostic variable, i.e. the proportion of times that the variable appeared in the model. The discriminative and calibrative abilities of prognostic models developed by the four methods were assessed at different inclusion levels. Results: We found that the effect of imputation variation on the inclusion frequency was larger than the effect of sampling variation. When MI and bootstrapping were combined at the range of 0% (full model) to 90% of variable selection, bootstrap corrected c-index values of 0.70 to 0.71 and slope values of 0.64 to 0.86 were found. Conclusion: We recommend to account for both imputation and sampling variation in sets of missing data. The new procedure of combining MI with bootstrapping for variable selection, results in multivariable prognostic models with good performance and is therefore attractive to apply on data sets with missing values.
引用
收藏
页数:11
相关论文
共 50 条
  • [1] Variable selection under multiple imputation using the bootstrap in a prognostic study
    Martijn W Heymans
    Stef van Buuren
    Dirk L Knol
    Willem van Mechelen
    Henrica CW de Vet
    [J]. BMC Medical Research Methodology, 7
  • [2] Effect of Variable Selection Strategy on the Performance of Prognostic Models When Using Multiple Imputation
    Austin, Peter C.
    Lee, Douglas S.
    Ko, Dennis T.
    White, Ian R.
    [J]. CIRCULATION-CARDIOVASCULAR QUALITY AND OUTCOMES, 2019, 12 (11):
  • [3] Bootstrap inference for multiple imputation under uncongeniality and misspecification
    Bartlett, Jonathan W.
    Hughes, Rachael A.
    [J]. STATISTICAL METHODS IN MEDICAL RESEARCH, 2020, 29 (12) : 3533 - 3546
  • [4] Bootstrap inference when using multiple imputation
    Schomaker, Michael
    Heumann, Hristian
    [J]. STATISTICS IN MEDICINE, 2018, 37 (14) : 2252 - 2266
  • [5] Applying the rescaling bootstrap under imputation: a simulation study
    Bruch, Christian
    [J]. JOURNAL OF STATISTICAL COMPUTATION AND SIMULATION, 2019, 89 (04) : 641 - 659
  • [6] Nonparametric Markov chain bootstrap for multiple imputation
    Zhang, LC
    [J]. COMPUTATIONAL STATISTICS & DATA ANALYSIS, 2004, 45 (02) : 343 - 353
  • [7] Variable selection techniques after multiple imputation in high-dimensional data
    Faisal Maqbool Zahid
    Shahla Faisal
    Christian Heumann
    [J]. Statistical Methods & Applications, 2020, 29 : 553 - 580
  • [8] Variable selection techniques after multiple imputation in high-dimensional data
    Zahid, Faisal Maqbool
    Faisal, Shahla
    Heumann, Christian
    [J]. STATISTICAL METHODS AND APPLICATIONS, 2020, 29 (03): : 553 - 580
  • [9] Bayesian Variable Selection for Multiclass Classification using Bootstrap Prior Technique
    Olaniran, Oyebayo Ridwan
    Bin Abdullah, Mohd Asrul Affendi
    [J]. AUSTRIAN JOURNAL OF STATISTICS, 2019, 48 (02) : 63 - 72
  • [10] ROBUST, SPARSE AND SCALABLE INFERENCE USING BOOTSTRAP AND VARIABLE SELECTION FUSION
    Mozafari-Majd, Emadaldin
    Koivunen, Visa
    [J]. 2019 IEEE 8TH INTERNATIONAL WORKSHOP ON COMPUTATIONAL ADVANCES IN MULTI-SENSOR ADAPTIVE PROCESSING (CAMSAP 2019), 2019, : 271 - 275