Post-model-selection inference in linear regression models: An integrated review

被引:11
|
作者
Zhang, Dongliang [1 ]
Khalili, Abbas [2 ]
Asgharian, Masoud [2 ]
机构
[1] Johns Hopkins Univ, Dept Biostat, Baltimore, MD 21205 USA
[2] McGill Univ, Dept Math & Stat, Montreal, PQ, Canada
基金
加拿大自然科学与工程研究理事会;
关键词
High-dimensional linear models; model selection; population- and projection-based regression coefficients; post-selection inference; VALID CONFIDENCE-INTERVALS; VARIABLE SELECTION; COVERAGE PROBABILITY; ADAPTIVE LASSO; P-VALUES; REGIONS; ESTIMATORS; BOOTSTRAP; UNIFORM; RECOVERY;
D O I
10.1214/22-SS135
中图分类号
O21 [概率论与数理统计]; C8 [统计学];
学科分类号
020208 ; 070103 ; 0714 ;
摘要
The research on statistical inference after data-driven model selection can be traced as far back as Koopmans (1949). The intensive research on modern model selection methods for high-dimensional data over the past three decades revived the interest in statistical inference after model selection. In recent years, there has been a surge of articles on statistical inference after model selection and now a rather vast literature exists on this topic. Our manuscript aims at presenting a holistic review of post-model-selection inference in linear regression models, while also incorporating perspectives from high-dimensional inference in these models. We first give a simulated example motivating the necessity for valid statistical inference after model selection. We then provide theoretical insights explaining the phenomena observed in the example. This is done through a literature survey on the post-selection sampling distribution of regression parameter estimators and properties of coverage probabilities of naive confidence intervals. Categorized according to two types of estimation targets, namely the population- and projection-based regression coefficients, we present a review of recent uncertainty assessment methods. We also discuss possible pros and cons for the confidence intervals constructed by different methods.
引用
收藏
页码:86 / 136
页数:51
相关论文
共 50 条