VALID POST-SELECTION INFERENCE IN MODEL-FREE LINEAR REGRESSION

被引:17
|
作者
Kuchibhotla, Arun K. [1 ]
Brown, Lawrence D. [1 ]
Buja, Andreas [1 ]
Cai, Junhui [1 ]
George, Edward, I [1 ]
Zhao, Linda H. [1 ]
机构
[1] Univ Penn, Dept Stat, Wharton Sch, Philadelphia, PA 19104 USA
来源
ANNALS OF STATISTICS | 2020年 / 48卷 / 05期
关键词
Simultaneous inference; multiplier bootstrap; uniform consistency; high-dimensional linear regression; concentration inequalities; Orlicz norms; model selection; BOOTSTRAP;
D O I
10.1214/19-AOS1917
中图分类号
O21 [概率论与数理统计]; C8 [统计学];
学科分类号
020208 ; 070103 ; 0714 ;
摘要
Modern data-driven approaches to modeling make extensive use of co-variate/model selection. Such selection incurs a cost: it invalidates classical statistical inference. A conservative remedy to the problem was proposed by Berk et al. (Ann. Statist. 41 (2013) 802-837) and further extended by Bachoc, Preinerstorfer and Steinberger (2016). These proposals, labeled "PoSI methods," provide valid inference after arbitrary model selection. They are computationally NP-hard and have limitations in their theoretical justifications. We therefore propose computationally efficient confidence regions, named "UPoSI"(1) and prove large-p asymptotics for them. We do this for linear OLS regression allowing misspecification of the normal linear model, for both fixed and random covariates, and for independent as well as some types of dependent data. We start by proving a general equivalence result for the post-selection inference problem and a simultaneous inference problem in a setting that strips inessential features still present in a related result of Berk et al. (Ann. Statist. 41 (2013) 802-837). We then construct valid PoSI confidence regions that are the first to have vastly improved computational efficiency in that the required computation times grow only quadratically rather than exponentially with the total number p of covariates. These are also the first PoSI confidence regions with guaranteed asymptotic validity when the total number of covariates p diverges (almost exponentially) with the sample size n. Under standard tail assumptions, we only require (log p)(7) = o(n) and k = o(root n/log p) where k (<= p) is the largest number of covariates (model size) considered for selection. We study various properties of these confidence regions, including their Lebesgue measures, and compare them theoretically with those proposed previously.
引用
收藏
页码:2953 / 2981
页数:29
相关论文
共 50 条
  • [1] VALID POST-SELECTION INFERENCE
    Berk, Richard
    Brown, Lawrence
    Buja, Andreas
    Zhang, Kai
    Zhao, Linda
    [J]. ANNALS OF STATISTICS, 2013, 41 (02): : 802 - 837
  • [2] Valid Post-Selection Inference in High-Dimensional Approximately Sparse Quantile Regression Models
    Belloni, Alexandre
    Chernozhukov, Victor
    Kato, Kengo
    [J]. JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION, 2019, 114 (526) : 749 - 758
  • [3] Exact Post-Selection Inference for Sequential Regression Procedures
    Tibshirani, Ryan J.
    Taylor, Jonathan
    Lockhart, Richard
    Tibshirani, Robert
    [J]. JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION, 2016, 111 (514) : 600 - 614
  • [4] Post-Selection Inference
    Kuchibhotla, Arun K.
    Kolassa, John E.
    Kuffner, Todd A.
    [J]. ANNUAL REVIEW OF STATISTICS AND ITS APPLICATION, 2022, 9 : 505 - 527
  • [5] Valid Post-Selection and Post-Regularization Inference: An Elementary, General Approach
    Chernozhukov, Victor
    Hansen, Christian
    Spindler, Martin
    [J]. ANNUAL REVIEW OF ECONOMICS, VOL 7, 2015, 7 : 649 - 688
  • [6] Approximately Valid and Model-Free Possibilistic Inference
    Cella, Leonardo
    Martin, Ryan
    [J]. BELIEF FUNCTIONS: THEORY AND APPLICATIONS (BELIEF 2021), 2021, 12915 : 127 - 136
  • [7] Post-selection inference in regression models for group testing data
    Shen, Qinyan
    Gregory, Karl
    Huang, Xianzheng
    [J]. BIOMETRICS, 2024, 80 (03)
  • [8] Post-Selection Inference for Generalized Linear Models With Many Controls
    Belloni, Alexandre
    Chernozhukov, Victor
    Wei, Ying
    [J]. JOURNAL OF BUSINESS & ECONOMIC STATISTICS, 2016, 34 (04) : 606 - 619
  • [9] Supplemental Appendix for "Valid Post-Selection and Post-Regularization Inference: An Elementary, General Approach"
    Chernozhukov, Victor
    Hansen, Christian
    Spindler, Martin
    [J]. ANNUAL REVIEW OF ECONOMICS, VOL 7, 2015, 7
  • [10] Splitting strategies for post-selection inference
    Rasines, D. Garcia
    Young, G. A.
    [J]. BIOMETRIKA, 2023, 110 (03) : 597 - 614