Effect of outliers on the variable selection by the regularized regression

被引:4
|
作者
Jeong, Junho [1 ]
Kim, Choongrak [1 ]
机构
[1] Pusan Natl Univ, Dept Stat, 2 Busandaehak Ro 63 Beon Gil, Busan 46241, South Korea
关键词
high-dimension; influential observation; LASSO; outlier; regularization;
D O I
10.29220/CSAM.2018.25.2.235
中图分类号
O21 [概率论与数理统计]; C8 [统计学];
学科分类号
020208 ; 070103 ; 0714 ;
摘要
Many studies exist on the influence of one or few observations on estimators in a variety of statistical models under the "large n, small p" setup; however, diagnostic issues in the regression models have been rarely studied in a high dimensional setup. In the high dimensional data, the influence of observations is more serious because the sample size n is significantly less than the number variables p. Here, we investigate the influence of observations on the least absolute shrinkage and selection operator (LASSO) estimates, suggested by Tibshirani (Journal of the Royal Statistical Society, Series B, 73, 273-282, 1996), and the influence of observations on selected variables by the LASSO in the high dimensional setup. We also derived an analytic expression for the influence of the k observation on LASSO estimates in simple linear regression. Numerical studies based on artificial data and real data are done for illustration. Numerical results showed that the influence of observations on the LASSO estimates and the selected variables by the LASSO in the high dimensional setup is more severe than that in the usual "large n, small p" setup.
引用
收藏
页码:235 / 243
页数:9
相关论文
共 50 条
  • [1] VARIABLE SELECTION AND COEFFICIENT ESTIMATION VIA REGULARIZED RANK REGRESSION
    Leng, Chenlei
    STATISTICA SINICA, 2010, 20 (01) : 167 - 181
  • [2] Regularized (bridge) logistic regression for variable selection based on ROC criterion
    Tian, Guo-Liang
    Fang, Hong-Bin
    Liu, Zhenqiu
    Tan, Ming T.
    STATISTICS AND ITS INTERFACE, 2009, 2 (04) : 493 - 502
  • [3] Regularized logistic regression and multiobjective variable selection for classifying MEG data
    Roberto Santana
    Concha Bielza
    Pedro Larrañaga
    Biological Cybernetics, 2012, 106 : 389 - 405
  • [4] Regularized logistic regression and multiobjective variable selection for classifying MEG data
    Santana, Roberto
    Bielza, Concha
    Larranaga, Pedro
    BIOLOGICAL CYBERNETICS, 2012, 106 (6-7) : 389 - 405
  • [5] Robust regression estimation and variable selection when cellwise and casewise outliers are present
    Toka, Onur
    Cetin, Meral
    Arslan, Olcay
    HACETTEPE JOURNAL OF MATHEMATICS AND STATISTICS, 2021, 50 (01): : 289 - 303
  • [6] Regularized win ratio regression for variable selection and risk prediction, with an application to a cardiovascular trial
    Lu Mao
    BMC Medical Research Methodology, 25 (1)
  • [7] Variable Selection for Varying Coefficient Models Via Kernel Based Regularized Rank Regression
    Wang, Kang-ning
    Lin, Lu
    ACTA MATHEMATICAE APPLICATAE SINICA-ENGLISH SERIES, 2020, 36 (02): : 458 - 470
  • [8] Efficient Regularized Regression with L0 Penalty for Variable Selection and Network Construction
    Liu, Zhenqiu
    Li, Gang
    COMPUTATIONAL AND MATHEMATICAL METHODS IN MEDICINE, 2016, 2016
  • [9] Variable Selection for Varying Coefficient Models Via Kernel Based Regularized Rank Regression
    Kang-ning Wang
    Lu Lin
    Acta Mathematicae Applicatae Sinica, English Series, 2020, 36 : 458 - 470
  • [10] Variable Selection for Varying Coefficient Models Via Kernel Based Regularized Rank Regression
    Kang-ning WANG
    Lu LIN
    ActaMathematicaeApplicataeSinica, 2020, 36 (02) : 458 - 470