Sparse PCA for High-Dimensional Data With Outliers

被引:42
|
作者
Hubert, Mia [1 ]
Reynkens, Tom [1 ]
Schmitt, Eric [1 ]
Verdonck, Tim [1 ]
机构
[1] Katholieke Univ Leuven, Dept Math, Leuven, Belgium
关键词
Dimension reduction; Outlier detection; Robustness; PROJECTION-PURSUIT APPROACH; PRINCIPAL COMPONENTS; ROBUST PCA;
D O I
10.1080/00401706.2015.1093962
中图分类号
O21 [概率论与数理统计]; C8 [统计学];
学科分类号
020208 ; 070103 ; 0714 ;
摘要
A new sparse PCA algorithm is presented, which is robust against outliers. The approach is based on the ROBPCA algorithm that generates robust but nonsparse loadings. The construction of the new ROSPCA method is detailed, as well as a selection criterion for the sparsity parameter. An extensive simulation study and a real data example are performed, showing that it is capable of accurately finding the sparse structure of datasets, even when challenging outliers are present. In comparison with a projection pursuit-based algorithm, ROSPCA demonstrates superior robustness properties and comparable sparsity estimation capability, as well as significantly faster computation time.
引用
收藏
页码:424 / 434
页数:11
相关论文
共 50 条
  • [1] Cluster PCA for outliers detection in high-dimensional data
    Stefatos, George
    Ben Hamza, A.
    [J]. 2007 IEEE INTERNATIONAL CONFERENCE ON SYSTEMS, MAN AND CYBERNETICS, VOLS 1-8, 2007, : 3961 - 3966
  • [2] PCA learning for sparse high-dimensional data
    Hoyle, DC
    Rattray, M
    [J]. EUROPHYSICS LETTERS, 2003, 62 (01): : 117 - 123
  • [3] MINIMAX BOUNDS FOR SPARSE PCA WITH NOISY HIGH-DIMENSIONAL DATA
    Birnbaum, Aharon
    Johnstone, Iain M.
    Nadler, Boaz
    Paul, Debashis
    [J]. ANNALS OF STATISTICS, 2013, 41 (03): : 1055 - 1084
  • [4] Multiple outliers detection in sparse high-dimensional regression
    Wang, Tao
    Li, Qun
    Chen, Bin
    Li, Zhonghua
    [J]. JOURNAL OF STATISTICAL COMPUTATION AND SIMULATION, 2018, 88 (01) : 89 - 107
  • [5] Robust PCA for high-dimensional data
    Hubert, M
    Rousseeuw, PJ
    Verboven, S
    [J]. DEVELOPMENTS IN ROBUST STATISTICS, 2003, : 169 - 179
  • [6] Detecting and ranking outliers in high-dimensional data
    Kaur, Amardeep
    Datta, Amitava
    [J]. INTERNATIONAL JOURNAL OF ADVANCES IN ENGINEERING SCIENCES AND APPLIED MATHEMATICS, 2019, 11 (01) : 75 - 87
  • [7] Hiding outliers in high-dimensional data spaces
    Steinbuss G.
    Böhm K.
    [J]. International Journal of Data Science and Analytics, 2017, 4 (3) : 173 - 189
  • [8] Detecting and ranking outliers in high-dimensional data
    Amardeep Kaur
    Amitava Datta
    [J]. International Journal of Advances in Engineering Sciences and Applied Mathematics, 2019, 11 : 75 - 87
  • [9] Scale-Invariant Sparse PCA on High-Dimensional Meta-Elliptical Data
    Han, Fang
    Liu, Han
    [J]. JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION, 2014, 109 (505) : 275 - 287
  • [10] On the anonymization of sparse high-dimensional data
    Ghinita, Gabriel
    Tao, Yufei
    Kalnis, Panos
    [J]. 2008 IEEE 24TH INTERNATIONAL CONFERENCE ON DATA ENGINEERING, VOLS 1-3, 2008, : 715 - +