SPARSE LEAST TRIMMED SQUARES REGRESSION FOR ANALYZING HIGH-DIMENSIONAL LARGE DATA SETS

被引:152
|
作者
Alfons, Andreas [1 ]
Croux, Christophe [1 ]
Gelper, Sarah [2 ]
机构
[1] Katholieke Univ Leuven, Fac Business & Econ, ORSTAT Res Ctr, B-3000 Louvain, Belgium
[2] Erasmus Univ, Rotterdam Sch Management, NL-3000 Rotterdam, Netherlands
来源
ANNALS OF APPLIED STATISTICS | 2013年 / 7卷 / 01期
关键词
Breakdown point; outliers; penalized regression; robust regression; trimming; VARIABLE SELECTION; MODEL SELECTION; LASSO; SHRINKAGE;
D O I
10.1214/12-AOAS575
中图分类号
O21 [概率论与数理统计]; C8 [统计学];
学科分类号
020208 ; 070103 ; 0714 ;
摘要
Sparse model estimation is a topic of high importance in modern data analysis due to the increasing availability of data sets with a large number of variables. Another common problem in applied statistics is the presence of outliers in the data. This paper combines robust regression and sparse model estimation. A robust and sparse estimator is introduced by adding an L-1 penalty on the coefficient estimates to the well-known least trimmed squares (LTS) estimator. The breakdown point of this sparse LTS estimator is derived, and a fast algorithm for its computation is proposed. In addition, the sparse LTS is applied to protein and gene expression data of the NCI-60 cancer cell panel. Both a simulation study and the real data application show that the sparse LTS has better prediction performance than its competitors in the presence of leverage points.
引用
收藏
页码:226 / 248
页数:23
相关论文
共 50 条
  • [1] Sparse least trimmed squares regression with compositional covariates for high-dimensional data
    Monti, Gianna Serafina
    Filzmoser, Peter
    BIOINFORMATICS, 2021, 37 (21) : 3805 - 3814
  • [2] PARTIAL LEAST SQUARES PREDICTION IN HIGH-DIMENSIONAL REGRESSION
    Cook, R. Dennis
    Forzani, Liliana
    ANNALS OF STATISTICS, 2019, 47 (02): : 884 - 908
  • [3] Semivarying coefficient least-squares support vector regression for analyzing high-dimensional gene-environmental data
    Shim, Jooyong
    Hwang, Changha
    Jeong, Sunjoo
    Sohn, Insuk
    JOURNAL OF APPLIED STATISTICS, 2018, 45 (08) : 1370 - 1381
  • [4] HIGH-DIMENSIONAL GENERALIZATIONS OF ASYMMETRIC LEAST SQUARES REGRESSION AND THEIR APPLICATIONS
    Gu, Yuwen
    Zou, Hui
    ANNALS OF STATISTICS, 2016, 44 (06): : 2661 - 2694
  • [5] Least squares after model selection in high-dimensional sparse models
    Belloni, Alexandre
    Chernozhukov, Victor
    BERNOULLI, 2013, 19 (02) : 521 - 547
  • [6] L1 least squares for sparse high-dimensional LDA
    Li, Yanfang
    Jia, Jinzhu
    ELECTRONIC JOURNAL OF STATISTICS, 2017, 11 (01): : 2499 - 2518
  • [7] Least trimmed squares regression, least median squares regression, and mathematical programming
    Giloni, A
    Padberg, M
    MATHEMATICAL AND COMPUTER MODELLING, 2002, 35 (9-10) : 1043 - 1060
  • [8] Partial least trimmed squares regression
    Xie, Zhonghao
    Feng, Xi'an
    Chen, Xiaojing
    CHEMOMETRICS AND INTELLIGENT LABORATORY SYSTEMS, 2022, 221
  • [9] Thresholding least-squares inference in high-dimensional regression models
    Giurcanu, Mihai
    ELECTRONIC JOURNAL OF STATISTICS, 2016, 10 (02): : 2124 - 2156
  • [10] Sign-constrained least squares estimation for high-dimensional regression
    Meinshausen, Nicolai
    ELECTRONIC JOURNAL OF STATISTICS, 2013, 7 : 1607 - 1631