Effect of outliers on the variable selection by the regularized regression

被引:4
|
作者
Jeong, Junho [1 ]
Kim, Choongrak [1 ]
机构
[1] Pusan Natl Univ, Dept Stat, 2 Busandaehak Ro 63 Beon Gil, Busan 46241, South Korea
关键词
high-dimension; influential observation; LASSO; outlier; regularization;
D O I
10.29220/CSAM.2018.25.2.235
中图分类号
O21 [概率论与数理统计]; C8 [统计学];
学科分类号
020208 ; 070103 ; 0714 ;
摘要
Many studies exist on the influence of one or few observations on estimators in a variety of statistical models under the "large n, small p" setup; however, diagnostic issues in the regression models have been rarely studied in a high dimensional setup. In the high dimensional data, the influence of observations is more serious because the sample size n is significantly less than the number variables p. Here, we investigate the influence of observations on the least absolute shrinkage and selection operator (LASSO) estimates, suggested by Tibshirani (Journal of the Royal Statistical Society, Series B, 73, 273-282, 1996), and the influence of observations on selected variables by the LASSO in the high dimensional setup. We also derived an analytic expression for the influence of the k observation on LASSO estimates in simple linear regression. Numerical studies based on artificial data and real data are done for illustration. Numerical results showed that the influence of observations on the LASSO estimates and the selected variables by the LASSO in the high dimensional setup is more severe than that in the usual "large n, small p" setup.
引用
收藏
页码:235 / 243
页数:9
相关论文
共 50 条
  • [1] VARIABLE SELECTION AND COEFFICIENT ESTIMATION VIA REGULARIZED RANK REGRESSION
    Leng, Chenlei
    [J]. STATISTICA SINICA, 2010, 20 (01) : 167 - 181
  • [2] Regularized logistic regression and multiobjective variable selection for classifying MEG data
    Roberto Santana
    Concha Bielza
    Pedro Larrañaga
    [J]. Biological Cybernetics, 2012, 106 : 389 - 405
  • [3] Regularized (bridge) logistic regression for variable selection based on ROC criterion
    Tian, Guo-Liang
    Fang, Hong-Bin
    Liu, Zhenqiu
    Tan, Ming T.
    [J]. STATISTICS AND ITS INTERFACE, 2009, 2 (04) : 493 - 502
  • [4] Regularized logistic regression and multiobjective variable selection for classifying MEG data
    Santana, Roberto
    Bielza, Concha
    Larranaga, Pedro
    [J]. BIOLOGICAL CYBERNETICS, 2012, 106 (6-7) : 389 - 405
  • [5] Robust regression estimation and variable selection when cellwise and casewise outliers are present
    Toka, Onur
    Cetin, Meral
    Arslan, Olcay
    [J]. HACETTEPE JOURNAL OF MATHEMATICS AND STATISTICS, 2021, 50 (01): : 289 - 303
  • [6] Variable Selection for Varying Coefficient Models Via Kernel Based Regularized Rank Regression
    Wang, Kang-ning
    Lin, Lu
    [J]. ACTA MATHEMATICAE APPLICATAE SINICA-ENGLISH SERIES, 2020, 36 (02): : 458 - 470
  • [7] Variable Selection for Varying Coefficient Models Via Kernel Based Regularized Rank Regression
    Kang-ning Wang
    Lu Lin
    [J]. Acta Mathematicae Applicatae Sinica, English Series, 2020, 36 : 458 - 470
  • [8] Efficient Regularized Regression with L0 Penalty for Variable Selection and Network Construction
    Liu, Zhenqiu
    Li, Gang
    [J]. COMPUTATIONAL AND MATHEMATICAL METHODS IN MEDICINE, 2016, 2016
  • [9] Variable Selection for Varying Coefficient Models Via Kernel Based Regularized Rank Regression
    Kang-ning WANG
    Lu LIN
    [J]. Acta Mathematicae Applicatae Sinica, 2020, 36 (02) : 458 - 470
  • [10] Comparative study of L1 regularized logistic regression methods for variable selection
    El Guide, M.
    Jbilou, K.
    Koukouvinos, C.
    Lappa, A.
    [J]. COMMUNICATIONS IN STATISTICS-SIMULATION AND COMPUTATION, 2022, 51 (09) : 4957 - 4972