Effect of outliers on the variable selection by the regularized regression

被引:4
|
作者
Jeong, Junho [1 ]
Kim, Choongrak [1 ]
机构
[1] Pusan Natl Univ, Dept Stat, 2 Busandaehak Ro 63 Beon Gil, Busan 46241, South Korea
关键词
high-dimension; influential observation; LASSO; outlier; regularization;
D O I
10.29220/CSAM.2018.25.2.235
中图分类号
O21 [概率论与数理统计]; C8 [统计学];
学科分类号
020208 ; 070103 ; 0714 ;
摘要
Many studies exist on the influence of one or few observations on estimators in a variety of statistical models under the "large n, small p" setup; however, diagnostic issues in the regression models have been rarely studied in a high dimensional setup. In the high dimensional data, the influence of observations is more serious because the sample size n is significantly less than the number variables p. Here, we investigate the influence of observations on the least absolute shrinkage and selection operator (LASSO) estimates, suggested by Tibshirani (Journal of the Royal Statistical Society, Series B, 73, 273-282, 1996), and the influence of observations on selected variables by the LASSO in the high dimensional setup. We also derived an analytic expression for the influence of the k observation on LASSO estimates in simple linear regression. Numerical studies based on artificial data and real data are done for illustration. Numerical results showed that the influence of observations on the LASSO estimates and the selected variables by the LASSO in the high dimensional setup is more severe than that in the usual "large n, small p" setup.
引用
收藏
页码:235 / 243
页数:9
相关论文
共 50 条
  • [21] Variable selection in linear regression
    Lindsey, Charles
    Sheather, Simon
    STATA JOURNAL, 2010, 10 (04): : 650 - 669
  • [22] Variable selection for mode regression
    Chen, Yingzhen
    Ma, Xuejun
    Zhou, Jingke
    JOURNAL OF APPLIED STATISTICS, 2018, 45 (06) : 1077 - 1084
  • [23] ON VARIABLE SELECTION IN MULTIVARIATE REGRESSION
    SPARKS, RS
    ZUCCHINI, W
    COUTSOURIDES, D
    COMMUNICATIONS IN STATISTICS-THEORY AND METHODS, 1985, 14 (07) : 1569 - 1587
  • [24] VARIABLE SELECTION IN QUANTILE REGRESSION
    Wu, Yichao
    Liu, Yufeng
    STATISTICA SINICA, 2009, 19 (02) : 801 - 817
  • [25] STABILIZING VARIABLE SELECTION AND REGRESSION
    Pfister, Niklas
    Williams, Evan G.
    Peters, Jonas
    Aebersold, Ruedi
    Buehlmann, Peter
    ANNALS OF APPLIED STATISTICS, 2021, 15 (03): : 1220 - 1246
  • [26] Variable Selection with Regression Trees
    Chang, Youngjae
    KOREAN JOURNAL OF APPLIED STATISTICS, 2010, 23 (02) : 357 - 366
  • [27] Variable Selection in ROC Regression
    Wang, Binhuan
    COMPUTATIONAL AND MATHEMATICAL METHODS IN MEDICINE, 2013, 2013
  • [28] A regularized orthogonal activated inverse-learning neural network for regression and classification with outliers
    Zhang, Zhijun
    Song, Yating
    Chen, Tao
    He, Jie
    NEURAL NETWORKS, 2024, 173
  • [29] lassopack: Model selection and prediction with regularized regression in Stata
    Ahrens, Achim
    Hansen, Christian B.
    Schaffer, Mark E.
    STATA JOURNAL, 2020, 20 (01): : 176 - 235
  • [30] Regularized simultaneous model selection in multiple quantiles regression
    Zou, Hui
    Yuan, Ming
    COMPUTATIONAL STATISTICS & DATA ANALYSIS, 2008, 52 (12) : 5296 - 5304