Model-assisted calibration of non-probability sample survey data using adaptive LASSO

被引:0
|
作者
Chen, Jack Kuang Tsung [1 ]
Valliant, Richard L. [2 ]
Elliott, Michael R. [3 ,4 ]
机构
[1] Survey Monkey Inc, Palo Alto, CA 94301 USA
[2] Univ Michigan, Inst Social Res, Survey Res Ctr, Ann Arbor, MI USA
[3] Univ Michigan, Sch Publ Hlth, Survey Res Ctr, Inst Social Res, Ann Arbor, MI 48109 USA
[4] Univ Michigan, Sch Publ Hlth, Dept Biostat, Ann Arbor, MI 48109 USA
关键词
Adaptive LASSO estimators; Generalized regression estimator; Non-representative sample; Over-fitting; Variable selection; Oracle property; REGRESSION ESTIMATION; SELECTION;
D O I
暂无
中图分类号
O1 [数学]; C [社会科学总论];
学科分类号
03 ; 0303 ; 0701 ; 070101 ;
摘要
The probability-sampling-based framework has dominated survey research because it provides precise mathematical tools to assess sampling variability. However increasing costs and declining response rates are expanding the use of non-probability samples, particularly in general population settings, where samples of individuals pulled from web surveys are becoming increasingly cheap and easy to access. But non-probability samples are at risk for selection bias due to differential access, degrees of interest, and other factors. Calibration to known statistical totals in the population provide a means of potentially diminishing the effect of selection bias in non-probability samples. Here we show that model calibration using adaptive LASSO can yield a consistent estimator of a population total as long as a subset of the true predictors is included in the prediction model, thus allowing large numbers of possible covariates to be included without risk of overfilling. We show that the model calibration using adaptive LASSO provides improved estimation with respect to mean square error relative to standard competitors such as generalized regression (GREG) estimators when a large number of covariates are required to determine the true model, with effectively no loss in efficiency over GREG when smaller models will suffice. We also derive closed form variance estimators of population totals, and compare their behavior with bootstrap estimators. We conclude with a real world example using data from the National Health Interview Survey.
引用
收藏
页码:117 / 144
页数:28
相关论文
共 50 条