Effect of outliers on the variable selection by the regularized regression

被引:4
|
作者
Jeong, Junho [1 ]
Kim, Choongrak [1 ]
机构
[1] Pusan Natl Univ, Dept Stat, 2 Busandaehak Ro 63 Beon Gil, Busan 46241, South Korea
关键词
high-dimension; influential observation; LASSO; outlier; regularization;
D O I
10.29220/CSAM.2018.25.2.235
中图分类号
O21 [概率论与数理统计]; C8 [统计学];
学科分类号
020208 ; 070103 ; 0714 ;
摘要
Many studies exist on the influence of one or few observations on estimators in a variety of statistical models under the "large n, small p" setup; however, diagnostic issues in the regression models have been rarely studied in a high dimensional setup. In the high dimensional data, the influence of observations is more serious because the sample size n is significantly less than the number variables p. Here, we investigate the influence of observations on the least absolute shrinkage and selection operator (LASSO) estimates, suggested by Tibshirani (Journal of the Royal Statistical Society, Series B, 73, 273-282, 1996), and the influence of observations on selected variables by the LASSO in the high dimensional setup. We also derived an analytic expression for the influence of the k observation on LASSO estimates in simple linear regression. Numerical studies based on artificial data and real data are done for illustration. Numerical results showed that the influence of observations on the LASSO estimates and the selected variables by the LASSO in the high dimensional setup is more severe than that in the usual "large n, small p" setup.
引用
收藏
页码:235 / 243
页数:9
相关论文
共 50 条
  • [41] Bayesian variable selection for regression models
    Kuo, L
    Mallick, B
    [J]. AMERICAN STATISTICAL ASSOCIATION - 1996 PROCEEDINGS OF THE SECTION ON BAYESIAN STATISTICAL SCIENCE, 1996, : 170 - 175
  • [42] Variable selection in wavelet regression models
    Alsberg, BK
    Woodward, AM
    Winson, MK
    Rowland, JJ
    Kell, DB
    [J]. ANALYTICA CHIMICA ACTA, 1998, 368 (1-2) : 29 - 44
  • [43] An adaptive method of variable selection in regression
    O'Gorman, Thomas W.
    [J]. COMMUNICATIONS IN STATISTICS-SIMULATION AND COMPUTATION, 2008, 37 (06) : 1129 - 1142
  • [44] Variable Selection in Logistic Regression Model
    Zhang Shangli
    Zhang Lili
    Qiu Kuanmin
    Lu Ying
    Cai Baigen
    [J]. CHINESE JOURNAL OF ELECTRONICS, 2015, 24 (04) : 813 - 817
  • [45] A NONPARAMETRIC METHOD OF VARIABLE SELECTION FOR REGRESSION
    LUVALLE, MJ
    [J]. BIOMETRICS, 1983, 39 (04) : 1119 - 1119
  • [46] Variable selection in multivariate multiple regression
    Variyath, Asokan Mulayath
    Brobbey, Anita
    [J]. PLOS ONE, 2020, 15 (07):
  • [47] Variable selection for sparse logistic regression
    Zanhua Yin
    [J]. Metrika, 2020, 83 : 821 - 836
  • [48] Bayesian variable selection for logistic regression
    Tian, Yiqing
    Bondell, Howard D.
    Wilson, Alyson
    [J]. STATISTICAL ANALYSIS AND DATA MINING, 2019, 12 (05) : 378 - 393
  • [49] Variable Selection in Logistic Regression Model
    ZHANG Shangli
    ZHANG Lili
    QIU Kuanmin
    LU Ying
    CAI Baigen
    [J]. Chinese Journal of Electronics, 2015, 24 (04) : 813 - 817
  • [50] Pseudo estimation and variable selection in regression
    Wu, Wenbo
    Yin, Xiangrong
    [J]. JOURNAL OF STATISTICAL PLANNING AND INFERENCE, 2020, 208 : 25 - 35