A fast bootstrap algorithm for causal inference with large data

被引:0
|
作者
Kosko, Matthew [1 ]
Wang, Lin [2 ]
Santacatterina, Michele [3 ]
机构
[1] George Washington Univ, Dept Stat, Washington, DC 20052 USA
[2] Purdue Univ, Dept Stat, W Lafayette, IN USA
[3] NYU, Dept Populat Hlth, New York, NY USA
关键词
causal bootstrap; covariate balance; machine learning; propensity score; real-world data; PROPENSITY-SCORE; INVERSE PROBABILITY; HORMONE-THERAPY; STRATEGIES; ESTIMATORS; VARIANCE;
D O I
10.1002/sim.10075
中图分类号
Q [生物科学];
学科分类号
07 ; 0710 ; 09 ;
摘要
Estimating causal effects from large experimental and observational data has become increasingly prevalent in both industry and research. The bootstrap is an intuitive and powerful technique used to construct standard errors and confidence intervals of estimators. Its application however can be prohibitively demanding in settings involving large data. In addition, modern causal inference estimators based on machine learning and optimization techniques exacerbate the computational burden of the bootstrap. The bag of little bootstraps has been proposed in non-causal settings for large data but has not yet been applied to evaluate the properties of estimators of causal effects. In this article, we introduce a new bootstrap algorithm called causal bag of little bootstraps for causal inference with large data. The new algorithm significantly improves the computational efficiency of the traditional bootstrap while providing consistent estimates and desirable confidence interval coverage. We describe its properties, provide practical considerations, and evaluate the performance of the proposed algorithm in terms of bias, coverage of the true 95% confidence intervals, and computational time in a simulation study. We apply it in the evaluation of the effect of hormone therapy on the average time to coronary heart disease using a large observational data set from the Women's Health Initiative.
引用
收藏
页码:2894 / 2927
页数:34
相关论文
共 50 条
  • [21] Causal Inference with Selectively Deconfounded Data
    Gan, Kyra
    Li, Andrew A.
    Lipton, Zachary C.
    Tayur, Sridhar
    [J]. 24TH INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND STATISTICS (AISTATS), 2021, 130
  • [22] Accurate Causal Inference on Discrete Data
    Budhathoki, Kailash
    Vreeken, Junes
    [J]. 2018 IEEE INTERNATIONAL CONFERENCE ON DATA MINING (ICDM), 2018, : 881 - 886
  • [23] MDL for Causal Inference on Discrete Data
    Budhathoki, Kailash
    Vreeken, Jilles
    [J]. 2017 17TH IEEE INTERNATIONAL CONFERENCE ON DATA MINING (ICDM), 2017, : 751 - 756
  • [24] Causal Inference: A Missing Data Perspective
    Ding, Peng
    Li, Fan
    [J]. STATISTICAL SCIENCE, 2018, 33 (02) : 214 - 237
  • [25] Causal Inference from Network Data
    Zheleva, Elena
    Arbour, David
    [J]. KDD '21: PROCEEDINGS OF THE 27TH ACM SIGKDD CONFERENCE ON KNOWLEDGE DISCOVERY & DATA MINING, 2021, : 4096 - 4097
  • [26] Collaborative causal inference on distributed data
    Kawamata, Yuji
    Motai, Ryoki
    Okada, Yukihiko
    Imakura, Akira
    Sakurai, Tetsuya
    [J]. EXPERT SYSTEMS WITH APPLICATIONS, 2024, 244
  • [27] Causal inference and data fusion in econometrics
    Huenermund, Paul
    Bareinboim, Elias
    [J]. ECONOMETRICS JOURNAL, 2023,
  • [28] A CAUSAL BOOTSTRAP
    Imbens, Guido
    Menzel, Konrad
    [J]. ANNALS OF STATISTICS, 2021, 49 (03): : 1460 - 1488
  • [29] Idlight: A fast haplotype inference algorithm for large-scale unphased diploid genotype data based on EM algorithm and graph theoretical data structure.
    Kajitani, K
    Mase, Y
    Ito, Y
    Sato, R
    Kamatani, N
    Yanagisawa, M
    [J]. AMERICAN JOURNAL OF HUMAN GENETICS, 2003, 73 (05) : 474 - 474
  • [30] VIF Regression: A Fast Regression Algorithm For Large Data
    Lin, Dongyu
    Foster, Dean P.
    [J]. 2009 9TH IEEE INTERNATIONAL CONFERENCE ON DATA MINING, 2009, : 848 - 853