RECONCILING DESIGN-BASED AND MODEL-BASED CAUSAL INFERENCES FOR SPLIT-PLOT EXPERIMENTS

被引:5
|
作者
Zhao, Anqi [1 ]
Ding, Peng [2 ]
机构
[1] Natl Univ Singapore, Dept Stat & Data Sci, Singapore, Singapore
[2] Univ Calif Berkeley, Dept Stat, Berkeley, CA 94720 USA
来源
ANNALS OF STATISTICS | 2022年 / 50卷 / 02期
基金
美国国家科学基金会;
关键词
Cluster randomization; cluster-robust standard error; covariate adjustment; inverse probability weighting; potential outcome; randomization inference; REGRESSION ADJUSTMENTS;
D O I
10.1214/21-AOS2144
中图分类号
O21 [概率论与数理统计]; C8 [统计学];
学科分类号
020208 ; 070103 ; 0714 ;
摘要
The split-plot design arose from agricultural science with experimental units, also known as the subplots, nested within groups known as the whole plots. It assigns different interventions at the whole-plot and subplot levels, respectively, providing a convenient way to accommodate hard-to-change factors. By design, subplots within the same whole plot receive the same level of the whole-plot intervention, and thereby induce a group structure on the final treatment assignments. A common strategy is to run an ordinary least squares (oLs) regression of the outcome on the treatment indicators coupled with the robust standard errors clustered at the whole-plot level. It does not give consistent estimators for the treatment effects of interest when the whole-plot sizes vary. Another common strategy is to fit a linear mixed-effects model of the outcome with normal random effects and errors. It is a purely model-based approach and can be sensitive to violations of the parametric assumptions. In contrast, design-based inference assumes no outcome models and relies solely on the controllable randomization mechanism determined by the physical experiment. We first extend the existing design-based inference based on the Horvitz-Thompson estimator to the Hajek estimator, and establish the finite-population central limit theorem for both under split-plot randomization. We then reconcile the results with those under the model-based approach, and propose two regression strategies, namely (i) the weighted least squares (wLs) fit of the unit-level data based on the inverse probability weighting and (ii) the OLS fit of the aggregate data based on whole-plot total outcomes, to reproduce the Hajek and Horvitz- Thompson estimators, respectively. This, together with the asymptotic conservativeness of the corresponding cluster-robust covariances for estimating the true design-based covariances as we establish in the process, justifies the validity of the regression estimators for design-based inference. In light of the flexibility of regression formulation for covariate adjustment, we further extend the theory to the case with covariates, and demonstrate the efficiency gain by regression-based covariate adjustment via both asymptotic theory and simulation. Importantly, all our theories are either numeric or design-based, and hold regardless of how well the regression equations represent the true data generating process.
引用
收藏
页码:1170 / 1192
页数:23
相关论文
共 50 条