Multiple imputation and test-wise deletion for causal discovery with incomplete cohort data

被引:5
|
作者
Witte, Janine [1 ,2 ]
Foraita, Ronja [1 ]
Didelez, Vanessa [1 ,2 ]
机构
[1] Leibniz Inst Prevent Res & Epidemiol BIPS, Bremen, Germany
[2] Univ Bremen, Fac Math & Comp Sci, Bremen, Germany
关键词
causal inference; causal search; MICE; missing values; PC-algorithm; structure learning; MISSING DATA; GRAPHICAL MODELS; INFERENCE; DIAGRAMS; NETWORK; MICE;
D O I
10.1002/sim.9535
中图分类号
Q [生物科学];
学科分类号
07 ; 0710 ; 09 ;
摘要
Causal discovery algorithms estimate causal graphs from observational data. This can provide a valuable complement to analyses focusing on the causal relation between individual treatment-outcome pairs. Constraint-based causal discovery algorithms rely on conditional independence testing when building the graph. Until recently, these algorithms have been unable to handle missing values. In this article, we investigate two alternative solutions: test-wise deletion and multiple imputation. We establish necessary and sufficient conditions for the recoverability of causal structures under test-wise deletion, and argue that multiple imputation is more challenging in the context of causal discovery than for estimation. We conduct an extensive comparison by simulating from benchmark causal graphs: as one might expect, we find that test-wise deletion and multiple imputation both clearly outperform list-wise deletion and single imputation. Crucially, our results further suggest that multiple imputation is especially useful in settings with a small number of either Gaussian or discrete variables, but when the dataset contains a mix of both neither method is uniformly best. The methods we compare include random forest imputation and a hybrid procedure combining test-wise deletion and multiple imputation. An application to data from the IDEFICS cohort study on diet- and lifestyle-related diseases in European children serves as an illustrating example.
引用
收藏
页码:4716 / 4743
页数:28
相关论文
共 50 条
  • [1] Fast causal inference with non-random missingness by test-wise deletion
    Strobl, Eric V.
    Visweswaran, Shyam
    Spirtes, Peter L.
    [J]. INTERNATIONAL JOURNAL OF DATA SCIENCE AND ANALYTICS, 2018, 6 (01) : 47 - 62
  • [2] Multiple imputation for incomplete data with semicontinuous variables
    Javaras, KN
    Van Dyk, DA
    [J]. JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION, 2003, 98 (463) : 703 - 715
  • [3] A multiple imputation strategy for incomplete longitudinal data
    Landrum, MB
    Becker, MP
    [J]. STATISTICS IN MEDICINE, 2001, 20 (17-18) : 2741 - 2760
  • [4] Multiple Imputation for Incomplete Data in Epidemiologic Studies
    Harel, Ofer
    Mitchell, Emily M.
    Perkins, Neil J.
    Cole, Stephen R.
    Tchetgen, Eric J. Tchetgen
    Sun, BaoLuo
    Schisterman, Enrique F.
    [J]. AMERICAN JOURNAL OF EPIDEMIOLOGY, 2018, 187 (03) : 576 - 584
  • [5] Causal discovery of gene regulation with incomplete data
    Foraita, Ronja
    Friemel, Juliane
    Guenther, Kathrin
    Behrens, Thomas
    Bullerdiek, Joern
    Nimzyk, Rolf
    Ahrens, Wolfgang
    Didelez, Vanessa
    [J]. JOURNAL OF THE ROYAL STATISTICAL SOCIETY SERIES A-STATISTICS IN SOCIETY, 2020, 183 (04) : 1747 - 1775
  • [6] Multiple Imputation and Genetic Programming for Classification with Incomplete Data
    Cao Truong Tran
    Zhang, Mengjie
    Andreae, Peter
    Xue, Bing
    [J]. PROCEEDINGS OF THE 2017 GENETIC AND EVOLUTIONARY COMPUTATION CONFERENCE (GECCO'17), 2017, : 521 - 528
  • [7] Multiple Imputation for Incomplete Data in Environmental Epidemiology Research
    Prince Addo Allotey
    Ofer Harel
    [J]. Current Environmental Health Reports, 2019, 6 : 62 - 71
  • [8] Multiple Imputation and Ensemble Learning for Classification with Incomplete Data
    Cao Truong Tran
    Zhang, Mengjie
    Andreae, Peter
    Xue, Bing
    Lam Thu Bui
    [J]. INTELLIGENT AND EVOLUTIONARY SYSTEMS, IES 2016, 2017, 8 : 401 - 415
  • [9] Multiple Imputation for Incomplete Data in Environmental Epidemiology Research
    Allotey, Prince Addo
    Harel, Ofer
    [J]. CURRENT ENVIRONMENTAL HEALTH REPORTS, 2019, 6 (02) : 62 - 71
  • [10] A functional multiple imputation approach to incomplete longitudinal data
    He, Yulei
    Yucel, Recai
    Raghunathan, Trivellore E.
    [J]. STATISTICS IN MEDICINE, 2011, 30 (10) : 1137 - 1156