PROGRAM EVALUATION AND CAUSAL INFERENCE WITH HIGH-DIMENSIONAL DATA

被引:181
|
作者
Belloni, A. [1 ]
Chernozhukov, V. [2 ]
Fernandez-Val, I. [3 ]
Hansen, C. [4 ]
机构
[1] Duke Univ, Fuqua Sch Business, 100 Fuqua Dr,POB 90120,Off W312, Durham, NC 27708 USA
[2] MIT, Dept Econ, 50 Mem Dr,E52-361B, Cambridge, MA 02142 USA
[3] Boston Univ, Dept Econ, 270 Bay State Rd,Room 415A, Boston, MA 02215 USA
[4] Univ Chicago, Booth Sch Business, 5807 S Woodlawn Ave, Chicago, IL 60637 USA
基金
美国国家科学基金会;
关键词
Machine learning; causality; Neyman orthogonality; heterogenous treatment effects; endogeneity; local average and quantile treatment effects; instruments; local effects of treatment on the treated; propensity score; Lasso; inference after model selection; moment-condition models; moment-condition models with a continuum of target parameters; Lasso and Post-Lasso with functional response data; randomized control trials; EFFICIENT SEMIPARAMETRIC ESTIMATION; POST-REGULARIZATION INFERENCE; SQUARE-ROOT LASSO; QUANTILE REGRESSION; MODEL-SELECTION; INSTRUMENTAL VARIABLES; MOMENT RESTRICTIONS; ASYMPTOTIC THEORY; PROPENSITY SCORE; LINEAR-MODELS;
D O I
10.3982/ECTA12723
中图分类号
F [经济];
学科分类号
02 ;
摘要
In this paper, we provide efficient estimators and honest confidence bands for a variety of treatment effects including local average (LATE) and local quantile treatment effects (LQTE) in data-rich environments. We can handle very many control variables, endogenous receipt of treatment, heterogeneous treatment effects, and function-valued outcomes. Our framework covers the special case of exogenous receipt of treatment, either conditional on controls or unconditionally as in randomized control trials. In the latter case, our approach produces efficient estimators and honest bands for (functional) average treatment effects (ATE) and quantile treatment effects (QTE). To make informative inference possible, we assume that key reduced-form predictive relationships are approximately sparse. This assumption allows the use of regularization and selection methods to estimate those relations, and we provide methods for post-regularization and post-selection inference that are uniformly valid (honest) across a wide range of models. We show that a key ingredient enabling honest inference is the use of orthogonal or doubly robust moment conditions in estimating certain reduced-form functional parameters. We illustrate the use of the proposed methods with an application to estimating the effect of 401(k) eligibility and participation on accumulated assets. The results on program evaluation are obtained as a consequence of more general results on honest inference in a general moment-condition framework, which arises from structural equation models in econometrics. Here, too, the crucial ingredient is the use of orthogonal moment conditions, which can be constructed from the initial moment conditions. We provide results on honest inference for (function-valued) parameters within this general framework where any high-quality, machine learning methods (e.g., boosted trees, deep neural networks, random forest, and their aggregated and hybrid versions) can be used to learn the nonparametric/high-dimensional components of the model. These include a number of supporting auxiliary results that are of major independent interest: namely, we (1) prove uniform validity of a multiplier bootstrap, (2) offer a uniformly valid functional delta method, and (3) provide results for sparsity-based estimation of regression functions for function-valued outcomes.
引用
收藏
页码:233 / 298
页数:66
相关论文
共 50 条
  • [31] Online inference in high-dimensional generalized linear models with streaming data
    Luo, Lan
    Han, Ruijian
    Lin, Yuanyuan
    Huang, Jian
    ELECTRONIC JOURNAL OF STATISTICS, 2023, 17 (02): : 3443 - 3471
  • [32] Nonparametric inference for stochastic linear hypotheses: Application to high-dimensional data
    Kowalski, J
    Powell, J
    BIOMETRIKA, 2004, 91 (02) : 393 - 408
  • [33] On generalized latent factor modeling and inference for high-dimensional binomial data
    Ma, Ting Fung
    Wang, Fangfang
    Zhu, Jun
    BIOMETRICS, 2023, 79 (03) : 2311 - 2320
  • [34] Robust Statistical Inference for High-Dimensional Data Models with Application to Genomics
    Sen, Pranab Kumar
    AUSTRIAN JOURNAL OF STATISTICS, 2006, 35 (2-3) : 197 - 214
  • [35] High-dimensional data
    Geubbelmans, Melvin
    Rousseau, Axel-Jan
    Valkenborg, Dirk
    Burzykowski, Tomasz
    AMERICAN JOURNAL OF ORTHODONTICS AND DENTOFACIAL ORTHOPEDICS, 2023, 164 (03) : 453 - 456
  • [36] High-dimensional data
    Amaratunga, Dhammika
    Cabrera, Javier
    JOURNAL OF THE NATIONAL SCIENCE FOUNDATION OF SRI LANKA, 2016, 44 (01): : 3 - 9
  • [37] High-dimensional causal discovery based on heuristic causal partitioning
    Yinghan Hong
    Junping Guo
    Guizhen Mai
    Yingqing Lin
    Hao Zhang
    Zhifeng Hao
    Gengzhong Zheng
    Applied Intelligence, 2023, 53 : 23768 - 23796
  • [38] Multivariate tests for the evaluation of high-dimensional EEG data
    Hemmelmann, C
    Horn, M
    Reiterer, S
    Schack, B
    Süsse, T
    Weiss, S
    JOURNAL OF NEUROSCIENCE METHODS, 2004, 139 (01) : 111 - 120
  • [39] High-dimensional causal discovery based on heuristic causal partitioning
    Hong, Yinghan
    Guo, Junping
    Mai, Guizhen
    Lin, Yingqing
    Zhang, Hao
    Hao, Zhifeng
    Zheng, Gengzhong
    APPLIED INTELLIGENCE, 2023, 53 (20) : 23768 - 23796
  • [40] Communication-Efficient Distributed Estimation of Causal Effects With High-Dimensional Data
    Wang, Xiaohan
    Tong, Jiayi
    Peng, Sida
    Chen, Yong
    Ning, Yang
    STAT, 2024, 13 (03):