Multiple imputation in the presence of an incomplete binary variable created from an underlying continuous variable

被引:9
|
作者
Grobler, Anneke C. [1 ,2 ]
Lee, Katherine [1 ,2 ]
机构
[1] Murdoch Childrens Res Inst, Clin Epidemiol & Biostat Unit, Parkville, Vic, Australia
[2] Univ Melbourne, Dept Paediat, Parkville, Vic, Australia
基金
英国医学研究理事会;
关键词
binary variable; compatibility; fully conditional specification; multiple imputation; multivariate normal imputation; CHILDREN;
D O I
10.1002/bimj.201900011
中图分类号
Q [生物科学];
学科分类号
07 ; 0710 ; 09 ;
摘要
Multiple imputation (MI) is used to handle missing at random (MAR) data. Despite warnings from statisticians, continuous variables are often recoded into binary variables. With MI it is important that the imputation and analysis models are compatible; variables should be imputed in the same form they appear in the analysis model. With an encoded binary variable more accurate imputations may be obtained by imputing the underlying continuous variable. We conducted a simulation study to explore how best to impute a binary variable that was created from an underlying continuous variable. We generated a completely observed continuous outcome associated with an incomplete binary covariate that is a categorized version of an underlying continuous covariate, and an auxiliary variable associated with the underlying continuous covariate. We simulated data with several sample sizes, and set 25% and 50% of data in the covariate to MAR dependent on the outcome and the auxiliary variable. We compared the performance of five different imputation methods: (a) Imputation of the binary variable using logistic regression; (b) imputation of the continuous variable using linear regression, then categorizing into the binary variable; (c, d) imputation of both the continuous and binary variables using fully conditional specification (FCS) and multivariate normal imputation; (e) substantive-model compatible (SMC) FCS. Bias and standard errors were large when the continuous variable only was imputed. The other methods performed adequately. Imputation of both the binary and continuous variables using FCS often encountered mathematical difficulties. We recommend the SMC-FCS method as it performed best in our simulation studies.
引用
收藏
页码:467 / 478
页数:12
相关论文
共 50 条
  • [31] Continuous presence of aminoacids in a variable quantity in animal tissues.
    Delaunay, H
    COMPTES RENDUS DES SEANCES DE LA SOCIETE DE BIOLOGIE ET DE SES FILIALES, 1910, 69 : 594 - 595
  • [32] Latent variable models for teratogenesis using multiple binary outcomes
    Legler, JM
    Ryan, LM
    JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION, 1997, 92 (437) : 13 - 20
  • [33] Reliable neuromodulation from circuits with variable underlying structure
    Grashow, Rachel
    Brookings, Ted
    Marder, Eve
    PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 2009, 106 (28) : 11742 - 11746
  • [34] THE PREDICTION OF MEMBERSHIP IN A TRICHOTOMOUS DEPENDENT VARIABLE FROM SCORES IN A CONTINUOUS INDEPENDENT VARIABLE
    Michael, William B.
    Perry, Norman C.
    EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT, 1952, 12 (03) : 368 - 391
  • [35] A latent-variable marginal method for multi-level incomplete binary data
    Chen, Baojiang
    Zhou, Xiao-Hua
    STATISTICS IN MEDICINE, 2012, 31 (26) : 3211 - 3222
  • [36] Using cure models and multiple imputation to utilize recurrence as an auxiliary variable for overall survival
    Conlon, Anna S. C.
    Taylor, Jeremy M. G.
    Sargent, Daniel J.
    Yothers, Greg
    CLINICAL TRIALS, 2011, 8 (05) : 581 - 590
  • [37] Using latent variable modeling and multiple imputation to calibrate rater bias in diagnosis assessment
    Siddique, Juned
    Crespi, Catherine M.
    Gibbons, Robert D.
    Green, Bonnie L.
    STATISTICS IN MEDICINE, 2011, 30 (02) : 160 - 174
  • [38] Variable importance for sustaining macrophyte presence via random forests: data imputation and model settings
    Van Echelpoel, Wout
    Goethals, Peter L. M.
    SCIENTIFIC REPORTS, 2018, 8
  • [39] Effect of Variable Selection Strategy on the Performance of Prognostic Models When Using Multiple Imputation
    Austin, Peter C.
    Lee, Douglas S.
    Ko, Dennis T.
    White, Ian R.
    CIRCULATION-CARDIOVASCULAR QUALITY AND OUTCOMES, 2019, 12 (11):
  • [40] Variable importance for sustaining macrophyte presence via random forests: data imputation and model settings
    Wout Van Echelpoel
    Peter L. M. Goethals
    Scientific Reports, 8