Robust PCA for high-dimensional data

被引:0
|
作者
Hubert, M [1 ]
Rousseeuw, PJ [1 ]
Verboven, S [1 ]
机构
[1] Catholic Univ Louvain, Dept Math, B-3000 Louvain, Belgium
关键词
D O I
暂无
中图分类号
O21 [概率论与数理统计]; C8 [统计学];
学科分类号
020208 ; 070103 ; 0714 ;
摘要
Principal component analysis (PCA) is a well-known technique for dimension reduction. Classical PCA is based on the empirical mean and covariance matrix of the data, and hence is strongly affected by outlying observations. Therefore, there is a huge need for robust PCA. When the original number of variables is small enough, and in particular smaller than the number of observations, it is known that one can apply a robust estimator of multivariate location and scatter and compute the eigenvectors of the scatter matrix. The other situation, where there are many variables (often even more variables than observations), has received less attention in the robustness literature. We will compare two robust methods for this situation. The first one is based on projection pursuit (Li and Chen, 1985; Rousseeuw and Croux, 1993; Croux and Ruiz-Gazen, 1996, 2000; Hubert et al., 2002). The second method is a new proposal, which combines the notion of outlyingness (Stahel, 1981; Donoho, 1982) with the FAST-MCD algorithm (Rousseeuw and Van Driessen, 1999). The performance and the robustness of these two methods are compared through a simulation study. We also illustrate the new method on a chemometrical data set.
引用
收藏
页码:169 / 179
页数:11
相关论文
共 50 条
  • [41] Robust Hessian Locally Linear Embedding Techniques for High-Dimensional Data
    Xing, Xianglei
    Du, Sidan
    Wang, Kejun
    [J]. ALGORITHMS, 2016, 9 (02):
  • [42] Robust and sparse k-means clustering for high-dimensional data
    Brodinova, Sarka
    Filzmoser, Peter
    Ortner, Thomas
    Breiteneder, Christian
    Rohm, Maia
    [J]. ADVANCES IN DATA ANALYSIS AND CLASSIFICATION, 2019, 13 (04) : 905 - 932
  • [43] ROBUST NEAREST-NEIGHBOR METHODS FOR CLASSIFYING HIGH-DIMENSIONAL DATA
    Chan, Yao-Ban
    Hall, Peter
    [J]. ANNALS OF STATISTICS, 2009, 37 (6A): : 3186 - 3203
  • [44] A Data-dependent Approach for High-dimensional (Robust) Wasserstein Alignment
    Ding H.
    Liu W.
    Ye M.
    [J]. ACM Journal of Experimental Algorithmics, 2023, 28 (1-2):
  • [45] Scale-Invariant Sparse PCA on High-Dimensional Meta-Elliptical Data
    Han, Fang
    Liu, Han
    [J]. JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION, 2014, 109 (505) : 275 - 287
  • [46] PALLADIO: a parallel framework for robust variable selection in high-dimensional data
    Barbieri, Matteo
    Fiorini, Samuele
    Tomasi, Federico
    Barla, Annalisa
    [J]. PROCEEDINGS OF PYHPC2016: 6TH WORKSHOP ON PYTHON FOR HIGH-PERFORMANCE AND SCIENTIFIC COMPUTING, 2016, : 19 - 26
  • [47] Clustering High-Dimensional Data: A Reduction-Level Fusion of PCA and Random Projection
    Pasunuri, Raghunadh
    Venkaiah, Vadlamudi China
    Srivastava, Amit
    [J]. RECENT DEVELOPMENTS IN MACHINE LEARNING AND DATA ANALYTICS, 2019, 740 : 479 - 487
  • [48] A new robust covariance matrix estimation for high-dimensional microbiome data
    Wang, Jiyang
    Liang, Wanfeng
    Li, Lijie
    Wu, Yue
    Ma, Xiaoyan
    [J]. AUSTRALIAN & NEW ZEALAND JOURNAL OF STATISTICS, 2024, 66 (02) : 281 - 295
  • [49] ROBUST CLASSIFICATION OF HIGH-DIMENSIONAL DATA USING ARTIFICIAL NEURAL NETWORKS
    SMITH, DJ
    BAILEY, TC
    MUNFORD, AG
    [J]. STATISTICS AND COMPUTING, 1993, 3 (02) : 71 - 81
  • [50] Robust and sparse k-means clustering for high-dimensional data
    Šárka Brodinová
    Peter Filzmoser
    Thomas Ortner
    Christian Breiteneder
    Maia Rohm
    [J]. Advances in Data Analysis and Classification, 2019, 13 : 905 - 932