Post-model-selection inference in linear regression models: An integrated review

被引:11
|
作者
Zhang, Dongliang [1 ]
Khalili, Abbas [2 ]
Asgharian, Masoud [2 ]
机构
[1] Johns Hopkins Univ, Dept Biostat, Baltimore, MD 21205 USA
[2] McGill Univ, Dept Math & Stat, Montreal, PQ, Canada
基金
加拿大自然科学与工程研究理事会;
关键词
High-dimensional linear models; model selection; population- and projection-based regression coefficients; post-selection inference; VALID CONFIDENCE-INTERVALS; VARIABLE SELECTION; COVERAGE PROBABILITY; ADAPTIVE LASSO; P-VALUES; REGIONS; ESTIMATORS; BOOTSTRAP; UNIFORM; RECOVERY;
D O I
10.1214/22-SS135
中图分类号
O21 [概率论与数理统计]; C8 [统计学];
学科分类号
020208 ; 070103 ; 0714 ;
摘要
The research on statistical inference after data-driven model selection can be traced as far back as Koopmans (1949). The intensive research on modern model selection methods for high-dimensional data over the past three decades revived the interest in statistical inference after model selection. In recent years, there has been a surge of articles on statistical inference after model selection and now a rather vast literature exists on this topic. Our manuscript aims at presenting a holistic review of post-model-selection inference in linear regression models, while also incorporating perspectives from high-dimensional inference in these models. We first give a simulated example motivating the necessity for valid statistical inference after model selection. We then provide theoretical insights explaining the phenomena observed in the example. This is done through a literature survey on the post-selection sampling distribution of regression parameter estimators and properties of coverage probabilities of naive confidence intervals. Categorized according to two types of estimation targets, namely the population- and projection-based regression coefficients, we present a review of recent uncertainty assessment methods. We also discuss possible pros and cons for the confidence intervals constructed by different methods.
引用
收藏
页码:86 / 136
页数:51
相关论文
共 50 条
  • [1] A bootstrap recipe for post-model-selection inference under linear regression models
    Lee, S. M. S.
    Wu, Y.
    [J]. BIOMETRIKA, 2018, 105 (04) : 873 - 890
  • [2] Post-Model-Selection Prediction Intervals for Generalized Linear Models
    Dustin, Dean
    Clarke, Bertrand
    [J]. SANKHYA-SERIES A-MATHEMATICAL STATISTICS AND PROBABILITY, 2024,
  • [3] VALID POST-SELECTION INFERENCE IN MODEL-FREE LINEAR REGRESSION
    Kuchibhotla, Arun K.
    Brown, Lawrence D.
    Buja, Andreas
    Cai, Junhui
    George, Edward, I
    Zhao, Linda H.
    [J]. ANNALS OF STATISTICS, 2020, 48 (05): : 2953 - 2981
  • [4] Bayesian Post-Model-Selection Estimation
    Harel, Nadav
    Routtenberg, Tirza
    [J]. IEEE SIGNAL PROCESSING LETTERS, 2021, 28 : 175 - 179
  • [5] On Various Confidence Intervals Post-Model-Selection
    Leeb, Hannes
    Poetscher, Benedikt M.
    Ewald, Karl
    [J]. STATISTICAL SCIENCE, 2015, 30 (02) : 216 - 227
  • [6] THE IMPACT OF MODEL SELECTION ON INFERENCE IN LINEAR-REGRESSION
    HURVICH, CM
    TSAI, CL
    [J]. AMERICAN STATISTICIAN, 1990, 44 (03): : 214 - 217
  • [7] THE IMPACT OF MODEL SELECTION ON INFERENCE IN LINEAR-REGRESSION
    POTSCHER, BM
    [J]. AMERICAN STATISTICIAN, 1991, 45 (02): : 171 - 172
  • [8] Efficient and adaptive post-model-selection estimators
    Bühlmann, P
    [J]. JOURNAL OF STATISTICAL PLANNING AND INFERENCE, 1999, 79 (01) : 1 - 9
  • [9] INFERENCE AFTER VARIABLE SELECTION IN LINEAR-REGRESSION MODELS
    ZHANG, P
    [J]. BIOMETRIKA, 1992, 79 (04) : 741 - 746
  • [10] Post-Model-Selection Method for Density Estimation
    Wojtys, Malgorzata
    [J]. COMMUNICATIONS IN STATISTICS-THEORY AND METHODS, 2011, 40 (17) : 3082 - 3098