Post-model-selection inference in linear regression models: An integrated review

被引：11

作者：

Zhang, Dongliang ^{[1
]}

Khalili, Abbas ^{[2
]}

Asgharian, Masoud ^{[2
]}

机构：

[1] Johns Hopkins Univ, Dept Biostat, Baltimore, MD 21205 USA

[2] McGill Univ, Dept Math & Stat, Montreal, PQ, Canada

来源：

STATISTICS SURVEYS | 2022年 / 16卷

基金：

加拿大自然科学与工程研究理事会;

关键词：

High-dimensional linear models; model selection; population- and projection-based regression coefficients; post-selection inference; VALID CONFIDENCE-INTERVALS; VARIABLE SELECTION; COVERAGE PROBABILITY; ADAPTIVE LASSO; P-VALUES; REGIONS; ESTIMATORS; BOOTSTRAP; UNIFORM; RECOVERY;

D O I：

10.1214/22-SS135

中图分类号：

O21 [概率论与数理统计]; C8 [统计学];

学科分类号：

020208 ; 070103 ; 0714 ;

摘要：

The research on statistical inference after data-driven model selection can be traced as far back as Koopmans (1949). The intensive research on modern model selection methods for high-dimensional data over the past three decades revived the interest in statistical inference after model selection. In recent years, there has been a surge of articles on statistical inference after model selection and now a rather vast literature exists on this topic. Our manuscript aims at presenting a holistic review of post-model-selection inference in linear regression models, while also incorporating perspectives from high-dimensional inference in these models. We first give a simulated example motivating the necessity for valid statistical inference after model selection. We then provide theoretical insights explaining the phenomena observed in the example. This is done through a literature survey on the post-selection sampling distribution of regression parameter estimators and properties of coverage probabilities of naive confidence intervals. Categorized according to two types of estimation targets, namely the population- and projection-based regression coefficients, we present a review of recent uncertainty assessment methods. We also discuss possible pros and cons for the confidence intervals constructed by different methods.

引用

页码：86 / 136

页数：51

共 50 条

[21] Can one estimate the conditional distribution of post-model-selection estimators?
Leeb, Hannes
Poetscher, Benedikt M.
[J]. ANNALS OF STATISTICS, 2006, 34 (05): : 2554 - 2591
[22] A review of Bayesian group selection approaches for linear regression models
Lai, Wei-Ting
Chen, Ray-Bing
[J]. WILEY INTERDISCIPLINARY REVIEWS-COMPUTATIONAL STATISTICS, 2021, 13 (04)
[23] On the Length of Post-Model-Selection Confidence Intervals Conditional on Polyhedral Constraints
Kivaranovic, Danijel
Leeb, Hannes
[J]. JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION, 2021, 116 (534) : 845 - 857
[24] Non-Bayesian Post-Model-Selection Estimation as Estimation Under Model Misspecification
Harel, Nadav
Routtenberg, Tirza
[J]. IEEE TRANSACTIONS ON SIGNAL PROCESSING, 2024, 72 : 3641 - 3657
[25] Integrated likelihood inference in semiparametric regression models
He, H.
Severini, T. A.
[J]. METRON-INTERNATIONAL JOURNAL OF STATISTICS, 2014, 72 (02): : 185 - 199
[26] Predictive inference on equicorrelated linear regression models
Khan, S
Bhatti, MI
[J]. APPLIED MATHEMATICS AND COMPUTATION, 1998, 95 (2-3) : 205 - 217
[27] LIKELIHOOD INFERENCE FOR LINEAR-REGRESSION MODELS
DICICCIO, TJ
[J]. BIOMETRIKA, 1988, 75 (01) : 29 - 34
[28] Post-Selection and Post-Regularization Inference in Linear Models with Many Controls and Instruments
Chernozhukov, Victor
Hansen, Christian
Spindler, Martin
[J]. AMERICAN ECONOMIC REVIEW, 2015, 105 (05): : 486 - 490
[29] Post-J test inference in non-nested linear regression models
CHEN XinJie
FAN YanQin
WAN Alan
ZOU GuoHua
[J]. Science China Mathematics, 2015, 58 (06) : 1203 - 1216
[30] Post-J test inference in non-nested linear regression models
Chen XinJie
Fan YanQin
Wan, Alan
Zou GuoHua
[J]. SCIENCE CHINA-MATHEMATICS, 2015, 58 (06) : 1203 - 1216

← 1 2 3 4 5 →