Effect of outliers on the variable selection by the regularized regression

被引:4
|
作者
Jeong, Junho [1 ]
Kim, Choongrak [1 ]
机构
[1] Pusan Natl Univ, Dept Stat, 2 Busandaehak Ro 63 Beon Gil, Busan 46241, South Korea
关键词
high-dimension; influential observation; LASSO; outlier; regularization;
D O I
10.29220/CSAM.2018.25.2.235
中图分类号
O21 [概率论与数理统计]; C8 [统计学];
学科分类号
020208 ; 070103 ; 0714 ;
摘要
Many studies exist on the influence of one or few observations on estimators in a variety of statistical models under the "large n, small p" setup; however, diagnostic issues in the regression models have been rarely studied in a high dimensional setup. In the high dimensional data, the influence of observations is more serious because the sample size n is significantly less than the number variables p. Here, we investigate the influence of observations on the least absolute shrinkage and selection operator (LASSO) estimates, suggested by Tibshirani (Journal of the Royal Statistical Society, Series B, 73, 273-282, 1996), and the influence of observations on selected variables by the LASSO in the high dimensional setup. We also derived an analytic expression for the influence of the k observation on LASSO estimates in simple linear regression. Numerical studies based on artificial data and real data are done for illustration. Numerical results showed that the influence of observations on the LASSO estimates and the selected variables by the LASSO in the high dimensional setup is more severe than that in the usual "large n, small p" setup.
引用
收藏
页码:235 / 243
页数:9
相关论文
共 50 条
  • [41] Bayesian variable selection in quantile regression
    Yu, Keming
    Chen, Cathy W. S.
    Reed, Craig
    Dunson, David B.
    STATISTICS AND ITS INTERFACE, 2013, 6 (02) : 261 - 274
  • [42] Variable selection in wavelet regression models
    Alsberg, BK
    Woodward, AM
    Winson, MK
    Rowland, JJ
    Kell, DB
    ANALYTICA CHIMICA ACTA, 1998, 368 (1-2) : 29 - 44
  • [43] Variable selection in regression with compositional covariates
    Lin, Wei
    Shi, Pixu
    Feng, Rui
    Li, Hongzhe
    BIOMETRIKA, 2014, 101 (04) : 785 - 797
  • [44] Variable selection in multivariate multiple regression
    Variyath, Asokan Mulayath
    Brobbey, Anita
    PLOS ONE, 2020, 15 (07):
  • [45] Bayesian variable selection for logistic regression
    Tian, Yiqing
    Bondell, Howard D.
    Wilson, Alyson
    STATISTICAL ANALYSIS AND DATA MINING, 2019, 12 (05) : 378 - 393
  • [46] Variable selection for sparse logistic regression
    Zanhua Yin
    Metrika, 2020, 83 : 821 - 836
  • [47] An adaptive method of variable selection in regression
    O'Gorman, Thomas W.
    COMMUNICATIONS IN STATISTICS-SIMULATION AND COMPUTATION, 2008, 37 (06) : 1129 - 1142
  • [48] Variable Selection in Logistic Regression Model
    Zhang Shangli
    Zhang Lili
    Qiu Kuanmin
    Lu Ying
    Cai Baigen
    CHINESE JOURNAL OF ELECTRONICS, 2015, 24 (04) : 813 - 817
  • [49] A NONPARAMETRIC METHOD OF VARIABLE SELECTION FOR REGRESSION
    LUVALLE, MJ
    BIOMETRICS, 1983, 39 (04) : 1119 - 1119
  • [50] Variable Selection in Logistic Regression Model
    ZHANG Shangli
    ZHANG Lili
    QIU Kuanmin
    LU Ying
    CAI Baigen
    ChineseJournalofElectronics, 2015, 24 (04) : 813 - 817