Resampling methods for variable selection in robust regression

被引:24
|
作者
Wisnowski, JW
Simpson, JR
Montgomery, DC
Runger, GC
机构
[1] USAF Acad, DFMS, Dept Math Sci, Colorado Springs, CO 80840 USA
[2] Florida State Univ, Florida A&M Univ, Dept Ind & Mfg Engn, Tallahassee, FL 32310 USA
[3] Arizona State Univ, Dept Ind Engn, Tempe, AZ 85287 USA
关键词
outliers; robust regression; variable selection; bootstrap; cross-validation;
D O I
10.1016/S0167-9473(02)00235-9
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
With the inundation of large data sets requiring analysis and empirical model building, outliers have become commonplace. Fortunately, several standard statistical software packages have allowed practitioners to use robust regression estimators to easily fit data sets that are contaminated with outliers. However, little guidance is available for selecting the best subset of the predictor variables when using these robust estimators. We initially consider cross-validation and bootstrap resampling methods that have performed well for least-squares variable selection. It turns out that these variable selection methods cannot be directly applied to contaminated data sets using a robust estimation scheme. The prediction errors, inflated by the outliers, are not reliable measures of how well the robust model fits the data. As a result, new resampling variable selection methods are proposed by introducing alternative estimates of prediction error in the contaminated model. We demonstrate that, although robust estimation and resampling variable selection are computationally complex procedures, we can combine both techniques for superior results using modest computational resources. Monte Carlo simulation is used to evaluate the proposed variable selection procedures against alternatives through a designed experiment approach. The experiment factors include percentage of outliers, outlier geometry, bootstrap sample size, number of bootstrap samples, and cross-validation assessment size. The results are summarized and recommendations for use are provided. (C) 2002 Elsevier B.V. All rights reserved.
引用
收藏
页码:341 / 355
页数:15
相关论文
共 50 条
  • [21] Robust Variable Selection and Estimation Based on Kernel Modal Regression
    Guo, Changying
    Song, Biqin
    Wang, Yingjie
    Chen, Hong
    Xiong, Huijuan
    [J]. ENTROPY, 2019, 21 (04)
  • [22] Consistent and robust variable selection in regression based on Wald test
    Kamble, T. S.
    Kashid, D. N.
    Sakate, D. M.
    [J]. COMMUNICATIONS IN STATISTICS-THEORY AND METHODS, 2019, 48 (08) : 1981 - 2000
  • [23] Robust estimation and variable selection for function-on-scalar regression
    Cai, Xiong
    Xue, Liugen
    Ca, Jiguo
    [J]. CANADIAN JOURNAL OF STATISTICS-REVUE CANADIENNE DE STATISTIQUE, 2022, 50 (01): : 162 - 179
  • [24] Outlier Detection and Robust Variable Selection for Least Angle Regression
    Shahriari, Shirin
    Faria, Susana
    Manuela Goncalves, A.
    Van Aelst, Stefan
    [J]. COMPUTATIONAL SCIENCE AND ITS APPLICATIONS - ICCSA 2014, PT III, 2014, 8581 : 512 - +
  • [25] A simulation study on classic and robust variable selection in linear regression
    Çetin, Meral
    Erar, Aydin
    [J]. APPLIED MATHEMATICS AND COMPUTATION, 2006, 175 (02) : 1629 - 1643
  • [26] Robust and smoothing variable selection for quantile regression models with longitudinal data
    Fu, Z. C.
    Fu, L. Y.
    Song, Y. N.
    [J]. JOURNAL OF STATISTICAL COMPUTATION AND SIMULATION, 2023, 93 (15) : 2600 - 2624
  • [27] Fast robust variable selection using VIF regression in large datasets
    Seo, Han Son
    [J]. KOREAN JOURNAL OF APPLIED STATISTICS, 2018, 31 (04) : 463 - 473
  • [28] Robust Variable and Interaction Selection for Logistic Regression and General Index Models
    Li, Yang
    Liu, Jun S.
    [J]. JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION, 2019, 114 (525) : 271 - 286
  • [29] ROBUST VIF REGRESSION WITH APPLICATION TO VARIABLE SELECTION IN LARGE DATA SETS
    Dupuis, Debbie J.
    Victoria-Feser, Maria-Pia
    [J]. ANNALS OF APPLIED STATISTICS, 2013, 7 (01): : 319 - 341
  • [30] Unified distributed robust regression and variable selection framework for massive data
    Wang, Kangning
    [J]. EXPERT SYSTEMS WITH APPLICATIONS, 2021, 186