Stable prediction in high-dimensional linear models

被引:18
|
作者
Lin, Bingqing [1 ]
Wang, Qihua [1 ,2 ]
Zhang, Jun [1 ]
Pang, Zhen [3 ]
机构
[1] Shenzhen Univ, Inst Stat Sci, Coll Math & Stat, Shenzhen 518060, Peoples R China
[2] Chinese Acad Sci, Acad Math & Syst Sci, Beijing 100190, Peoples R China
[3] Hong Kong Polytech Univ, Dept Appl Math, Kowloon, Hong Kong, Peoples R China
基金
中国国家自然科学基金;
关键词
Model averaging; Variable selection; Penalized regression; Screening; VARIABLE SELECTION; REGRESSION;
D O I
10.1007/s11222-016-9694-6
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
We propose a Random Splitting Model Averaging procedure, RSMA, to achieve stable predictions in high-dimensional linear models. The idea is to use split training data to construct and estimate candidate models and use test data to form a second-level data. The second-level data is used to estimate optimal weights for candidate models by quadratic optimization under non-negative constraints. This procedure has three appealing features: (1) RSMA avoids model overfitting, as a result, gives improved prediction accuracy. (2) By adaptively choosing optimal weights, we obtain more stable predictions via averaging over several candidate models. (3) Based on RSMA, a weighted importance index is proposed to rank the predictors to discriminate relevant predictors from irrelevant ones. Simulation studies and a real data analysis demonstrate that RSMA procedure has excellent predictive performance and the associated weighted importance index could well rank the predictors.
引用
收藏
页码:1401 / 1412
页数:12
相关论文
共 50 条
  • [1] Stable prediction in high-dimensional linear models
    Bingqing Lin
    Qihua Wang
    Jun Zhang
    Zhen Pang
    [J]. Statistics and Computing, 2017, 27 : 1401 - 1412
  • [2] Prediction intervals, factor analysis models, and high-dimensional empirical linear prediction
    Ding, AA
    Hwang, JTG
    [J]. JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION, 1999, 94 (446) : 446 - 455
  • [3] Boosting for high-dimensional linear models
    Buhlmann, Peter
    [J]. ANNALS OF STATISTICS, 2006, 34 (02): : 559 - 583
  • [4] Optimal equivariant prediction for high-dimensional linear models with arbitrary predictor covariance
    Dicker, Lee H.
    [J]. ELECTRONIC JOURNAL OF STATISTICS, 2013, 7 : 1806 - 1834
  • [5] Prediction in abundant high-dimensional linear regression
    Cook, R. Dennis
    Forzani, Liliana
    Rothman, Adam J.
    [J]. ELECTRONIC JOURNAL OF STATISTICS, 2013, 7 : 3059 - 3088
  • [6] High-dimensional generalized linear models and the lasso
    van de Geer, Sara A.
    [J]. ANNALS OF STATISTICS, 2008, 36 (02): : 614 - 645
  • [7] Simultaneous Inference for High-Dimensional Linear Models
    Zhang, Xianyang
    Cheng, Guang
    [J]. JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION, 2017, 112 (518) : 757 - 768
  • [8] Statistical significance in high-dimensional linear models
    Buehlmann, Peter
    [J]. BERNOULLI, 2013, 19 (04) : 1212 - 1242
  • [9] Variance estimation in high-dimensional linear models
    Dicker, Lee H.
    [J]. BIOMETRIKA, 2014, 101 (02) : 269 - 284
  • [10] High-dimensional inference in misspecified linear models
    Buehlmann, Peter
    van de Geer, Sara
    [J]. ELECTRONIC JOURNAL OF STATISTICS, 2015, 9 (01): : 1449 - 1473