Machine Learning for Causal Inference: On the Use of Cross-fit Estimators

被引:40
|
作者
Zivich, Paul N. [1 ,2 ]
Breskin, Alexander [3 ]
机构
[1] Univ N Carolina, Gillings Sch Global Publ Hlth, Dept Epidemiol, Chapel Hill, NC 27516 USA
[2] Univ N Carolina, Carolina Populat Ctr, Chapel Hill, NC 27516 USA
[3] NoviSci, Durham, NC USA
基金
美国国家卫生研究院;
关键词
Causal inference; Epidemiologic methods; Machine learning; Observational studies; Super-learner; DOUBLY ROBUST ESTIMATION; PROPENSITY SCORE; ORDER; POPULATION; POSITIVITY; NETWORKS; MODELS; TIME;
D O I
10.1097/EDE.0000000000001332
中图分类号
R1 [预防医学、卫生学];
学科分类号
1004 ; 120402 ;
摘要
Background: Modern causal inference methods allow machine learning to be used to weaken parametric modeling assumptions. However, the use of machine learning may result in complications for inference. Doubly robust cross-fit estimators have been proposed to yield better statistical properties. Methods: We conducted a simulation study to assess the performance of several different estimators for the average causal effect. The data generating mechanisms for the simulated treatment and outcome included log-transforms, polynomial terms, and discontinuities. We compared singly robust estimators (g-computation, inverse probability weighting) and doubly robust estimators (augmented inverse probability weighting, targeted maximum likelihood estimation). We estimated nuisance functions with parametric models and ensemble machine learning separately. We further assessed doubly robust cross-fit estimators. Results: With correctly specified parametric models, all of the estimators were unbiased and confidence intervals achieved nominal coverage. When used with machine learning, the doubly robust cross-fit estimators substantially outperformed all of the other estimators in terms of bias, variance, and confidence interval coverage. Conclusions: Due to the difficulty of properly specifying parametric models in high-dimensional data, doubly robust estimators with ensemble learning and cross-fitting may be the preferred approach for estimation of the average causal effect in most epidemiologic studies. However, these approaches may require larger sample sizes to avoid finite-sample issues.
引用
收藏
页码:393 / 401
页数:9
相关论文
共 50 条
  • [31] Hyperparameter Tuning for Causal Inference with Double Machine Learning: A Simulation Study
    Bach, Philipp
    Schacht, Oliver
    Chernozhukov, Victor
    Klaassen, Sven
    Spindler, Martin
    CAUSAL LEARNING AND REASONING, VOL 236, 2024, 236 : 1065 - 1117
  • [32] Integrating Causal Inference and Machine Learning for Early Diagnosis and Management of Diabetes
    Echajei, Sahar
    Hafdane, Mohamed
    Ferjouchia, Hanane
    Rachik, Mostafa
    INTERNATIONAL JOURNAL OF ADVANCED COMPUTER SCIENCE AND APPLICATIONS, 2024, 15 (06) : 578 - 584
  • [33] Why prefer double robust estimators in causal inference?
    Neugebauer, R
    van der Laan, M
    JOURNAL OF STATISTICAL PLANNING AND INFERENCE, 2005, 129 (1-2) : 405 - 426
  • [34] Stable learning establishes some common ground between causal inference and machine learning
    Cui, Peng
    Athey, Susan
    NATURE MACHINE INTELLIGENCE, 2022, 4 (02) : 110 - 115
  • [35] Stable learning establishes some common ground between causal inference and machine learning
    Peng Cui
    Susan Athey
    Nature Machine Intelligence, 2022, 4 : 110 - 115
  • [36] A Practical Guide to Counterfactual Estimators for Causal Inference with Time-Series Cross-Sectional Data
    Liu, Licheng
    Wang, Ye
    Xu, Yiqing
    AMERICAN JOURNAL OF POLITICAL SCIENCE, 2024, 68 (01) : 160 - 176
  • [37] Causal Machine Learning and its use for public policy
    Lechner M.
    Swiss Journal of Economics and Statistics, 159 (1)
  • [38] Causality, causal discovery, causal inference and counterfactuals in Civil Engineering: Causal machine learning and case studies for knowledge discovery
    Naser, M. Z.
    Tapeh, Arash Teymori Gharah
    COMPUTERS AND CONCRETE, 2023, 31 (04): : 277 - 292
  • [39] Truly Cross-fit: The Association of Exercise and Clinical Outcomes: Introduction to a JINS Special Section
    Smith, Glenn E.
    Okonkwo, Ozioma C.
    JOURNAL OF THE INTERNATIONAL NEUROPSYCHOLOGICAL SOCIETY, 2021, 27 (08) : 757 - 760
  • [40] The value added of machine learning to causal inference: evidence from revisited studies
    Baiardi, Anna
    Naghi, Andrea A.
    ECONOMETRICS JOURNAL, 2024, 27 (02): : 213 - 234