A MODEL-AVERAGING METHOD FOR HIGH-DIMENSIONAL REGRESSION WITH MISSING RESPONSES AT RANDOM

被引:13
|
作者
Xie, Jinhan [1 ]
Yan, Xiaodong [2 ]
Tang, Niansheng [1 ]
机构
[1] Yunnan Univ, Key Lab Stat Modeling & Data Anal Yunnan Prov, Kunming 650500, Yunnan, Peoples R China
[2] Shandong Univ, Sch Econ, Jinan 250100, Peoples R China
基金
中国国家自然科学基金;
关键词
High-dimensional data; missing at random; model averaging; multiple imputation; screening; weighted delete-one cross-validation; GENERALIZED LINEAR-MODELS; EMPIRICAL LIKELIHOOD; VARIABLE SELECTION;
D O I
10.5705/ss.202018.0297
中图分类号
O21 [概率论与数理统计]; C8 [统计学];
学科分类号
020208 ; 070103 ; 0714 ;
摘要
This study considers the ultrahigh-dimensional prediction problem in the presence of responses missing at random. A two-step model-averaging procedure is proposed to improve the prediction accuracy of the conditional mean of the response variable. The first step specifies several candidate models, each with low-dimensional predictors. To implement this step, a new feature-screening method is developed to distinguish between the active and inactive predictors. The method uses the multiple-imputation sure independence screening (MI-SIS) procedure, and candidate models are formed by grouping covariates with similar size MI-SIS values. The second step develops a new criterion to find the optimal weights for averaging a set of candidate models using weighted delete-one cross-validation (WDCV). Under some regularity conditions, we show that the proposed screening statistic enjoys the ranking consistency property, and that the WDCV criterion asymptotically achieves the lowest possible prediction loss. Simulation studies and an example demonstrate the proposed methodology.
引用
下载
收藏
页码:1005 / 1026
页数:22
相关论文
共 50 条
  • [41] A MODEL OF DOUBLE DESCENT FOR HIGH-DIMENSIONAL LOGISTIC REGRESSION
    Deng, Zeyu
    Kammoun, Abla
    Thrampoulidis, Christos
    2020 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2020, : 4267 - 4271
  • [42] A systematic review on model selection in high-dimensional regression
    Eun Ryung Lee
    Jinwoo Cho
    Kyusang Yu
    Journal of the Korean Statistical Society, 2019, 48 : 1 - 12
  • [43] IMPUTED FACTOR REGRESSION FOR HIGH-DIMENSIONAL BLOCK-WISE MISSING DATA
    Zhang, Yanqing
    Tang, Niansheng
    Qu, Annie
    STATISTICA SINICA, 2020, 30 (02) : 631 - 651
  • [44] Model diagnosis for parametric regression in high-dimensional spaces
    Stute, W.
    Xu, W. L.
    Zhu, L. X.
    BIOMETRIKA, 2008, 95 (02) : 451 - 467
  • [45] SPReM: Sparse Projection Regression Model For High-Dimensional Linear Regression
    Sun, Qiang
    Zhu, Hongtu
    Liu, Yufeng
    Ibrahim, Joseph G.
    JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION, 2015, 110 (509) : 289 - 302
  • [46] Rate optimal estimation and confidence intervals for high-dimensional regression with missing covariates
    Wang, Yining
    Wang, Jialei
    Balakrishnan, Sivaraman
    Singh, Aarti
    JOURNAL OF MULTIVARIATE ANALYSIS, 2019, 174
  • [47] SEQUENTIAL MODEL AVERAGING FOR HIGH DIMENSIONAL LINEAR REGRESSION MODELS
    Lan, Wei
    Ma, Yingying
    Zhao, Junlong
    Wang, Hansheng
    Tsai, Chih-Ling
    STATISTICA SINICA, 2018, 28 (01) : 449 - 469
  • [48] Efficiency for heteroscedastic regression with responses missing at random
    Mueller, Ursula U.
    Schick, Anton
    JOURNAL OF STATISTICAL PLANNING AND INFERENCE, 2018, 196 : 132 - 143
  • [49] NEW TESTS FOR HIGH-DIMENSIONAL LINEAR REGRESSION BASED ON RANDOM PROJECTION
    Liu, Changyu
    Zhao, Xingqiu
    Huang, Jian
    STATISTICA SINICA, 2023, 33 (01) : 475 - 498
  • [50] High-Dimensional Analysis of Double Descent for Linear Regression with Random Projections
    Bach, Francis
    SIAM JOURNAL ON MATHEMATICS OF DATA SCIENCE, 2024, 6 (01): : 26 - 50