Robust PCA for high-dimensional data

被引:0
|
作者
Hubert, M [1 ]
Rousseeuw, PJ [1 ]
Verboven, S [1 ]
机构
[1] Catholic Univ Louvain, Dept Math, B-3000 Louvain, Belgium
关键词
D O I
暂无
中图分类号
O21 [概率论与数理统计]; C8 [统计学];
学科分类号
020208 ; 070103 ; 0714 ;
摘要
Principal component analysis (PCA) is a well-known technique for dimension reduction. Classical PCA is based on the empirical mean and covariance matrix of the data, and hence is strongly affected by outlying observations. Therefore, there is a huge need for robust PCA. When the original number of variables is small enough, and in particular smaller than the number of observations, it is known that one can apply a robust estimator of multivariate location and scatter and compute the eigenvectors of the scatter matrix. The other situation, where there are many variables (often even more variables than observations), has received less attention in the robustness literature. We will compare two robust methods for this situation. The first one is based on projection pursuit (Li and Chen, 1985; Rousseeuw and Croux, 1993; Croux and Ruiz-Gazen, 1996, 2000; Hubert et al., 2002). The second method is a new proposal, which combines the notion of outlyingness (Stahel, 1981; Donoho, 1982) with the FAST-MCD algorithm (Rousseeuw and Van Driessen, 1999). The performance and the robustness of these two methods are compared through a simulation study. We also illustrate the new method on a chemometrical data set.
引用
收藏
页码:169 / 179
页数:11
相关论文
共 50 条
  • [1] Robust PCA for high-dimensional data based on characteristic transformation
    He, Lingyu
    Yang, Yanrong
    Zhang, Bo
    [J]. AUSTRALIAN & NEW ZEALAND JOURNAL OF STATISTICS, 2023, 65 (02) : 127 - 151
  • [2] Improved Algorithms for High-dimensional Robust PCA
    Lin, Xiaoyong
    Zhang, Zeqiu
    Wang, Jue
    Zhang, Zhaoyang
    Qiu, Tingting
    Mi, Zhengkun
    [J]. 2016 IEEE INTERNATIONAL CONFERENCE ON UBIQUITOUS WIRELESS BROADBAND (ICUWB2016), 2016,
  • [3] Outlier-Robust PCA: The High-Dimensional Case
    Xu, Huan
    Caramanis, Constantine
    Mannor, Shie
    [J]. IEEE TRANSACTIONS ON INFORMATION THEORY, 2013, 59 (01) : 546 - 572
  • [4] PCA learning for sparse high-dimensional data
    Hoyle, DC
    Rattray, M
    [J]. EUROPHYSICS LETTERS, 2003, 62 (01): : 117 - 123
  • [5] Sparse PCA for High-Dimensional Data With Outliers
    Hubert, Mia
    Reynkens, Tom
    Schmitt, Eric
    Verdonck, Tim
    [J]. TECHNOMETRICS, 2016, 58 (04) : 424 - 434
  • [6] Cluster PCA for outliers detection in high-dimensional data
    Stefatos, George
    Ben Hamza, A.
    [J]. 2007 IEEE INTERNATIONAL CONFERENCE ON SYSTEMS, MAN AND CYBERNETICS, VOLS 1-8, 2007, : 3961 - 3966
  • [7] Optimally Weighted PCA for High-Dimensional Heteroscedastic Data
    Hong, David
    Yang, Fan
    Fessler, Jeffrey A.
    Balzano, Laura
    [J]. SIAM JOURNAL ON MATHEMATICS OF DATA SCIENCE, 2023, 5 (01): : 222 - 250
  • [8] Asymptotic performance of PCA for high-dimensional heteroscedastic data
    Hong, David
    Balzano, Laura
    Fessler, Jeffrey A.
    [J]. JOURNAL OF MULTIVARIATE ANALYSIS, 2018, 167 : 435 - 452
  • [9] Fast Robust Correlation for High-Dimensional Data
    Raymaekers, Jakob
    Rousseeuw, Peter J.
    [J]. TECHNOMETRICS, 2021, 63 (02) : 184 - 198
  • [10] Robust Ridge Regression for High-Dimensional Data
    Maronna, Ricardo A.
    [J]. TECHNOMETRICS, 2011, 53 (01) : 44 - 53