Causal Inference with Selectively Deconfounded Data

被引:0
|
作者
Gan, Kyra [1 ]
Li, Andrew A. [1 ]
Lipton, Zachary C. [1 ]
Tayur, Sridhar [1 ]
机构
[1] Carnegie Mellon Univ, Pittsburgh, PA 15213 USA
关键词
PROPENSITY SCORE;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Given only data generated by a standard confounding graph with unobserved confounder, the Average Treatment Effect (ATE) is not identifiable. To estimate the ATE, a practitioner must then either (a) collect deconfounded data; (b) run a clinical trial; or (c) elucidate further properties of the causal graph that might render the ATE identifiable. In this paper, we consider the benefit of incorporating a large confounded observational dataset (confounder unobserved) alongside a small deconfounded observational dataset (confounder revealed) when estimating the ATE. Our theoretical results suggest that the inclusion of confounded data can significantly reduce the quantity of deconfounded data required to estimate the ATE to within a desired accuracy level. Moreover, in some cases-say, genetics-we could imagine retrospectively selecting samples to deconfound. We demonstrate that by actively selecting these samples based upon the (already observed) treatment and outcome, we can reduce sample complexity further. Our theoretical and empirical results establish that the worst-case relative performance of our approach (vs. a natural benchmark) is bounded while our best-case gains are unbounded. Finally, we demonstrate the benefits of selective deconfounding using a large real-world dataset related to genetic mutation in cancer.
引用
收藏
页数:11
相关论文
共 50 条
  • [41] Causal inference with observational data: the need for triangulation of evidence
    Hammerton, Gemma
    Munafo, Marcus R.
    [J]. PSYCHOLOGICAL MEDICINE, 2021, 51 (04) : 563 - 578
  • [42] Causal inference and effect estimation using observational data
    Igelstrom, Erik
    Craig, Peter
    Lewsey, Jim
    Lynch, John
    Pearce, Anna
    Katikireddi, Srinivasa Vittal
    [J]. JOURNAL OF EPIDEMIOLOGY AND COMMUNITY HEALTH, 2022, 76 (11) : 960 - 966
  • [43] Functional Causal Inference with Time-to-Event Data
    Gao, Xiyuan
    Wang, Jiayi
    Hu, Guanyu
    Sun, Jianguo
    [J]. STATISTICS IN BIOSCIENCES, 2024,
  • [44] Causal inference for semi-competing risks data
    Nevo, Daniel
    Gorfine, Malka
    [J]. BIOSTATISTICS, 2022, 23 (04) : 1115 - 1132
  • [45] Causal Inference with Genetic Data: Past, Present, and Future
    Pingault, Jean-Baptiste
    Richmond, Rebecca
    Smith, George Davey
    [J]. COLD SPRING HARBOR PERSPECTIVES IN MEDICINE, 2022, 12 (03):
  • [46] ASSESSING STATISTICAL METHODS FOR CAUSAL INFERENCE IN OBSERVATIONAL DATA
    Parks, D. C.
    Lin, X.
    Lee, K. R.
    [J]. VALUE IN HEALTH, 2014, 17 (07) : A731 - A731
  • [47] Mendelian randomization: causal inference leveraging genetic data
    Chen, Lane G.
    Tubbs, Justin D.
    Liu, Zipeng
    Thach, Thuan-Quoc
    Sham, Pak C.
    [J]. PSYCHOLOGICAL MEDICINE, 2024, 54 (08) : 1461 - 1474
  • [49] Causal Inference on Multivariate and Mixed-Type Data
    Marx, Alexander
    Vreeken, Jilles
    [J]. MACHINE LEARNING AND KNOWLEDGE DISCOVERY IN DATABASES, ECML PKDD 2018, PT II, 2019, 11052 : 655 - 671
  • [50] ZaliQL: Causal Inference from Observational Data at Scale
    Salimi, Babak
    Cole, Corey
    Ports, Dan R. K.
    Suciu, Dan
    [J]. PROCEEDINGS OF THE VLDB ENDOWMENT, 2017, 10 (12): : 1957 - 1960