Consistent Estimation for PCA and Sparse Regression with Oblivious Outliers

被引:0
|
作者
d'Orsi, Tommaso [1 ]
Liu, Chih-Hung [1 ]
Nasser, Rajai [1 ]
Novikov, Gleb [1 ]
Steurer, David [1 ]
Tiegel, Stefan [1 ]
机构
[1] Swiss Fed Inst Technol, Zurich, Switzerland
基金
欧洲研究理事会;
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
We develop machinery to design efficiently computable and consistent estimators, achieving estimation error approaching zero as the number of observations grows, when facing an oblivious adversary that may corrupt responses in all but an alpha fraction of the samples. As concrete examples, we investigate two problems: sparse regression and principal component analysis (PCA). For sparse regression, we achieve consistency for optimal sample size n greater than or similar to (k log d)alpha(2) and optimal error rate O (root(k log d)/n . alpha(2))) where n is the number of observations, d is the number of dimensions and k is the sparsity of the parameter vector, allowing the fraction of inliers to be inverse-polynomial in the number of samples. Prior to this work, no estimator was known to be consistent when the fraction of inliers alpha is o(1/log log n), even for (non-spherical) Gaussian design matrices. Results holding under weak design assumptions and in the presence of such general noise have only been shown in dense setting (i.e., general linear regression) very recently by d'Orsi et al. (dNS21). In the context of PCA, we attain optimal error guarantees under broad spikiness assumptions on the parameter matrix (usually used in matrix completion). Previous works could obtain non-trivial guarantees only under the assumptions that the measurement noise corresponding to the inliers is polynomially small in n (e.g., Gaussian with variance 1/n(2)). To devise our estimators, we equip the Huber loss with non-smooth regularizers such as the l(1) norm or the nuclear norm, and extend d'Orsi et al.'s approach (dNS21) in a novel way to analyze the loss function. Our machinery appears to be easily applicable to a wide range of estimation problems. We complement these algorithmic results with statistical lower bounds showing that the fraction of inliers that our PCA estimator can deal with is optimal up to a constant factor.
引用
收藏
页数:12
相关论文
共 50 条
  • [1] Consistent regression when oblivious outliers overwhelm
    d'Orsi, Tommaso
    Novikov, Gleb
    Steurer, David
    [J]. INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 139, 2021, 139
  • [2] Noise Statistics Oblivious GARD For Robust Regression With Sparse Outliers
    Kallummil, Sreejith
    Kalyani, Sheetal
    [J]. IEEE TRANSACTIONS ON SIGNAL PROCESSING, 2019, 67 (02) : 383 - 398
  • [3] Sparse PCA from Sparse Linear Regression
    Bresler, Guy
    Park, Sung Min
    Persu, Madalina
    [J]. ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 31 (NIPS 2018), 2018, 31
  • [4] Sparse PCA for High-Dimensional Data With Outliers
    Hubert, Mia
    Reynkens, Tom
    Schmitt, Eric
    Verdonck, Tim
    [J]. TECHNOMETRICS, 2016, 58 (04) : 424 - 434
  • [5] Sparse regression for large data sets with outliers
    Bottmer, Lea
    Croux, Christophe
    Wilms, Ines
    [J]. EUROPEAN JOURNAL OF OPERATIONAL RESEARCH, 2022, 297 (02) : 782 - 794
  • [6] Robust State Estimation with Sparse Outliers
    Graham, Matthew C.
    How, Jonathan P.
    Gustafson, Donald E.
    [J]. JOURNAL OF GUIDANCE CONTROL AND DYNAMICS, 2015, 38 (07) : 1229 - 1240
  • [7] Robust Estimation of Regression Coefficients with Outliers
    Ampanthong, Pimpan
    Suwattee, Prachoom
    [J]. THAILAND STATISTICIAN, 2010, 8 (02): : 183 - 205
  • [8] Multiple outliers detection in sparse high-dimensional regression
    Wang, Tao
    Li, Qun
    Chen, Bin
    Li, Zhonghua
    [J]. JOURNAL OF STATISTICAL COMPUTATION AND SIMULATION, 2018, 88 (01) : 89 - 107
  • [9] Estimation of regression parameters in the presence of outliers in the response
    Sen Roy, Sugata
    Guria, Sibnarayan
    [J]. STATISTICS, 2009, 43 (06) : 531 - 539
  • [10] SPARSE PCA: OPTIMAL RATES AND ADAPTIVE ESTIMATION
    Cai, T. Tony
    Ma, Zongming
    Wu, Yihong
    [J]. ANNALS OF STATISTICS, 2013, 41 (06): : 3074 - 3110