A MODEL-AVERAGING METHOD FOR HIGH-DIMENSIONAL REGRESSION WITH MISSING RESPONSES AT RANDOM

被引:13
|
作者
Xie, Jinhan [1 ]
Yan, Xiaodong [2 ]
Tang, Niansheng [1 ]
机构
[1] Yunnan Univ, Key Lab Stat Modeling & Data Anal Yunnan Prov, Kunming 650500, Yunnan, Peoples R China
[2] Shandong Univ, Sch Econ, Jinan 250100, Peoples R China
基金
中国国家自然科学基金;
关键词
High-dimensional data; missing at random; model averaging; multiple imputation; screening; weighted delete-one cross-validation; GENERALIZED LINEAR-MODELS; EMPIRICAL LIKELIHOOD; VARIABLE SELECTION;
D O I
10.5705/ss.202018.0297
中图分类号
O21 [概率论与数理统计]; C8 [统计学];
学科分类号
020208 ; 070103 ; 0714 ;
摘要
This study considers the ultrahigh-dimensional prediction problem in the presence of responses missing at random. A two-step model-averaging procedure is proposed to improve the prediction accuracy of the conditional mean of the response variable. The first step specifies several candidate models, each with low-dimensional predictors. To implement this step, a new feature-screening method is developed to distinguish between the active and inactive predictors. The method uses the multiple-imputation sure independence screening (MI-SIS) procedure, and candidate models are formed by grouping covariates with similar size MI-SIS values. The second step develops a new criterion to find the optimal weights for averaging a set of candidate models using weighted delete-one cross-validation (WDCV). Under some regularity conditions, we show that the proposed screening statistic enjoys the ranking consistency property, and that the WDCV criterion asymptotically achieves the lowest possible prediction loss. Simulation studies and an example demonstrate the proposed methodology.
引用
收藏
页码:1005 / 1026
页数:22
相关论文
共 50 条
  • [21] Robust high-dimensional regression for data with anomalous responses
    Mingyang Ren
    Sanguo Zhang
    Qingzhao Zhang
    Annals of the Institute of Statistical Mathematics, 2021, 73 : 703 - 736
  • [22] Robust high-dimensional regression for data with anomalous responses
    Ren, Mingyang
    Zhang, Sanguo
    Zhang, Qingzhao
    ANNALS OF THE INSTITUTE OF STATISTICAL MATHEMATICS, 2021, 73 (04) : 703 - 736
  • [23] Outlier detection in high-dimensional regression model
    Wang, Tao
    Li, Zhonghua
    COMMUNICATIONS IN STATISTICS-THEORY AND METHODS, 2017, 46 (14) : 6947 - 6958
  • [24] Nonparametric regression with responses missing at random
    Efromovich, Sam
    JOURNAL OF STATISTICAL PLANNING AND INFERENCE, 2011, 141 (12) : 3744 - 3752
  • [25] Corrupted and missing predictors: Minimax bounds for high-dimensional linear regression
    Loh, Po-Ling
    Wainwright, Martin J.
    2012 IEEE INTERNATIONAL SYMPOSIUM ON INFORMATION THEORY PROCEEDINGS (ISIT), 2012,
  • [26] HIGH-DIMENSIONAL REGRESSION WITH NOISY AND MISSING DATA: PROVABLE GUARANTEES WITH NONCONVEXITY
    Loh, Po-Ling
    Wainwright, Martin J.
    ANNALS OF STATISTICS, 2012, 40 (03): : 1637 - 1664
  • [27] On the choice of high-dimensional regression parameters in Gaussian random tomography
    Rau, Christian
    RESULTS IN APPLIED MATHEMATICS, 2019, 3
  • [28] A robust model averaging approach for partially linear models with responses missing at random
    Liang, Zhongqi
    Wang, Qihua
    SCANDINAVIAN JOURNAL OF STATISTICS, 2023, 50 (04) : 1933 - 1952
  • [29] Optimal model averaging forecasting in high-dimensional survival analysis
    Yan, Xiaodong
    Wang, Hongni
    Wang, Wei
    Xie, Jinhan
    Ren, Yanyan
    Wang, Xinjun
    INTERNATIONAL JOURNAL OF FORECASTING, 2021, 37 (03) : 1147 - 1155
  • [30] STOCHASTIC GAUSSIAN PROCESS MODEL AVERAGING FOR HIGH-DIMENSIONAL INPUTS
    Xuereb, Maxime
    Ng, Szu Hui
    Pedrielli, Giulia
    2020 WINTER SIMULATION CONFERENCE (WSC), 2020, : 373 - 384