Random Forests Approach for Causal Inference with Clustered Observational Data

被引:10
|
作者
Suk, Youmi [1 ]
Kang, Hyunseung [2 ]
Kim, Jee-Seon [1 ]
机构
[1] Univ Wisconsin Madison, Dept Educ Psychol, Madison, WI 53706 USA
[2] Univ Wisconsin Madison, Dept Stat, Madison, WI USA
关键词
Causal inference; machine learning methods; multilevel propensity score matching; multilevel observational data; hierarchical linear modeling; PROPENSITY SCORE ESTIMATION; SELECTION BIAS; STRATIFICATION; REGRESSION; IMPACT;
D O I
10.1080/00273171.2020.1808437
中图分类号
O1 [数学];
学科分类号
0701 ; 070101 ;
摘要
There is a growing interest in using machine learning (ML) methods for causal inference due to their (nearly) automatic and flexible ability to model key quantities such as the propensity score or the outcome model. Unfortunately, most ML methods for causal inference have been studied under single-level settings where all individuals are independent of each other and there is little work in using these methods with clustered or nested data, a common setting in education studies. This paper investigates using one particular ML method based on random forests known as Causal Forests to estimate treatment effects in multilevel observational data. We conduct simulation studies under different types of multilevel data, including two-level, three-level, and cross-classified data. Our simulation study shows that when the ML method is supplemented with estimated propensity scores from multilevel models that account for clustered/hierarchical structure, the modified ML method outperforms preexisting methods in a wide variety of settings. We conclude by estimating the effect of private math lessons in the Trends in International Mathematics and Science Study data, a large-scale educational assessment where students are nested within schools.
引用
收藏
页码:829 / 852
页数:24
相关论文
共 50 条
  • [31] Causal Inference in Geoscience and Remote Sensing From Observational Data
    Perez-Suay, Adrian
    Camps-Valls, Gustau
    IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, 2019, 57 (03): : 1502 - 1513
  • [32] Mutual Information Based Matching for Causal Inference with Observational Data
    Sun, Lei
    Nikolaev, Alexander G.
    JOURNAL OF MACHINE LEARNING RESEARCH, 2016, 17
  • [33] Using genetic data to strengthen causal inference in observational research
    Jean-Baptiste Pingault
    Paul F. O’Reilly
    Tabea Schoeler
    George B. Ploubidis
    Frühling Rijsdijk
    Frank Dudbridge
    Nature Reviews Genetics, 2018, 19 : 566 - 580
  • [34] Using genetic data to strengthen causal inference in observational research
    Pingault, Jean-Baptiste
    O'Reilly, Paul F.
    Schoeler, Tabea
    Ploubidis, George B.
    Rijsdijk, Fruhling
    Dudbridge, Frank
    NATURE REVIEWS GENETICS, 2018, 19 (09) : 566 - 580
  • [35] Causal inference from observational data in emergency medicine research
    Catoire, Pierre
    Genuer, Robin
    Proust-Lima, Cecile
    EUROPEAN JOURNAL OF EMERGENCY MEDICINE, 2023, 30 (02) : 67 - 69
  • [36] Causal inference on the impact of nutrition policies using observational data
    Mazzocchi, Mario
    Capacci, Sara
    Biondi, Beatrice
    BIO-BASED AND APPLIED ECONOMICS, 2022, 11 (01): : 3 - 20
  • [37] Causal inference on observational data: Opportunities and challenges in earthquake engineering
    Burton, Henry
    EARTHQUAKE SPECTRA, 2023, 39 (01) : 54 - 76
  • [38] Causal Inference in Industrial Alarm Data by Timely Clustered Alarms and Transfer Entropy
    Fahimipirehgalin, Mina
    Weiss, Iris
    Vogel-Heuser, Birgit
    2020 EUROPEAN CONTROL CONFERENCE (ECC 2020), 2020, : 2056 - 2061
  • [39] Causal Inference for Observational Studies
    Kaplan, David
    JOURNAL OF INFECTIOUS DISEASES, 2019, 219 (01): : 1 - 2
  • [40] A Causal Inference Model Based on Random Forests to Identify the Effect of Soil Moisture on Precipitation
    Li, Lu
    Shangguan, Wei
    Deng, Yi
    Mao, Jiafu
    Pan, Jinjing
    Wei, Nan
    Yuan, Hua
    Zhang, Shupeng
    Zhang, Yonggen
    Dai, Yongjiu
    JOURNAL OF HYDROMETEOROLOGY, 2020, 21 (05) : 1115 - 1131