Ultra-high dimensional variable selection for doubly robust causal inference

被引:5
|
作者
Tang, Dingke [1 ]
Kong, Dehan [1 ]
Pan, Wenliang [2 ]
Wang, Linbo [1 ]
机构
[1] Univ Toronto, Dept Stat Sci, Toronto, ON M5S 3G3, Canada
[2] Sun Yat Sen Univ, Sch Math, Dept Stat Sci, Guangzhou, Guangdong, Peoples R China
基金
中国国家自然科学基金;
关键词
Alzheimer's disease; average causal effect; ball covariance; confounder selection; variable screening; PROPENSITY SCORE; ALZHEIMERS-DISEASE; MODEL SELECTION; ADAPTIVE LASSO; EFFICIENT; TAU; BIOMARKERS;
D O I
10.1111/biom.13625
中图分类号
Q [生物科学];
学科分类号
07 ; 0710 ; 09 ;
摘要
Causal inference has been increasingly reliant on observational studies with rich covariate information. To build tractable causal procedures, such as the doubly robust estimators, it is imperative to first extract important features from high or even ultra-high dimensional data. In this paper, we propose causal ball screening for confounder selection from modern ultra-high dimensional data sets. Unlike the familiar task of variable selection for prediction modeling, our confounder selection procedure aims to control for confounding while improving efficiency in the resulting causal effect estimate. Previous empirical and theoretical studies suggest excluding causes of the treatment that are not confounders. Motivated by these results, our goal is to keep all the predictors of the outcome in both the propensity score and outcome regression models. A distinctive feature of our proposal is that we use an outcome model-free procedure for propensity score model selection, thereby maintaining double robustness in the resulting causal effect estimator. Our theoretical analyses show that the proposed procedure enjoys a number of properties, including model selection consistency and pointwise normality. Synthetic and real data analysis show that our proposal performs favorably with existing methods in a range of realistic settings. Data used in preparation of this paper were obtained from the Alzheimer's Disease Neuroimaging Initiative (ADNI) database.
引用
收藏
页码:903 / 914
页数:12
相关论文
共 50 条
  • [41] Improving causal inference with a doubly robust estimator that combines propensity score stratification and weighting
    Linden, Ariel
    [J]. JOURNAL OF EVALUATION IN CLINICAL PRACTICE, 2017, 23 (04) : 697 - 702
  • [42] Dynamic artificial immune system with variable selection based on causal inference
    Shu, Yidan
    Zhao, Jinsong
    [J]. 12TH INTERNATIONAL SYMPOSIUM ON PROCESS SYSTEMS ENGINEERING (PSE) AND 25TH EUROPEAN SYMPOSIUM ON COMPUTER AIDED PROCESS ENGINEERING (ESCAPE), PT B, 2015, 37 : 1793 - 1798
  • [43] Doubly robust inference when combining probability and non-probability samples with high dimensional data
    Yang, Shu
    Kim, Jae Kwang
    Song, Rui
    [J]. JOURNAL OF THE ROYAL STATISTICAL SOCIETY SERIES B-STATISTICAL METHODOLOGY, 2020, 82 (02) : 445 - 465
  • [44] BEAR: Sketching BFGS Algorithm for Ultra-High Dimensional Feature Selection in Sublinear Memory
    Aghazadeh, Amirali
    Gupta, Vipul
    DeWeese, Alex
    Koyluoglu, O. Ozan
    Ramchandran, Kannan
    [J]. MATHEMATICAL AND SCIENTIFIC MACHINE LEARNING, VOL 145, 2021, 145 : 75 - 92
  • [45] Sequential Lasso Cum EBIC for Feature Selection With Ultra-High Dimensional Feature Space
    Luo, Shan
    Chen, Zehua
    [J]. JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION, 2014, 109 (507) : 1229 - 1240
  • [46] Large-Scale Online Feature Selection for Ultra-High Dimensional Sparse Data
    Wu, Yue
    Hoi, Steven C. H.
    Mei, Tao
    Yu, Nenghai
    [J]. ACM TRANSACTIONS ON KNOWLEDGE DISCOVERY FROM DATA, 2017, 11 (04)
  • [47] Semiparametric Bayesian information criterion for model selection in ultra-high dimensional additive models
    Lian, Heng
    [J]. JOURNAL OF MULTIVARIATE ANALYSIS, 2014, 123 : 304 - 310
  • [48] PALLADIO: a parallel framework for robust variable selection in high-dimensional data
    Barbieri, Matteo
    Fiorini, Samuele
    Tomasi, Federico
    Barla, Annalisa
    [J]. PROCEEDINGS OF PYHPC2016: 6TH WORKSHOP ON PYTHON FOR HIGH-PERFORMANCE AND SCIENTIFIC COMPUTING, 2016, : 19 - 26
  • [49] Robust and consistent variable selection in high-dimensional generalized linear models
    Avella-Medina, Marco
    Ronchetti, Elvezio
    [J]. BIOMETRIKA, 2018, 105 (01) : 31 - 44
  • [50] Robust Variable Selection with Optimality Guarantees for High-Dimensional Logistic Regression
    Insolia, Luca
    Kenney, Ana
    Calovi, Martina
    Chiaromonte, Francesca
    [J]. STATS, 2021, 4 (03): : 665 - 681