Robust PCA for high-dimensional data

被引:0
|
作者
Hubert, M [1 ]
Rousseeuw, PJ [1 ]
Verboven, S [1 ]
机构
[1] Catholic Univ Louvain, Dept Math, B-3000 Louvain, Belgium
关键词
D O I
暂无
中图分类号
O21 [概率论与数理统计]; C8 [统计学];
学科分类号
020208 ; 070103 ; 0714 ;
摘要
Principal component analysis (PCA) is a well-known technique for dimension reduction. Classical PCA is based on the empirical mean and covariance matrix of the data, and hence is strongly affected by outlying observations. Therefore, there is a huge need for robust PCA. When the original number of variables is small enough, and in particular smaller than the number of observations, it is known that one can apply a robust estimator of multivariate location and scatter and compute the eigenvectors of the scatter matrix. The other situation, where there are many variables (often even more variables than observations), has received less attention in the robustness literature. We will compare two robust methods for this situation. The first one is based on projection pursuit (Li and Chen, 1985; Rousseeuw and Croux, 1993; Croux and Ruiz-Gazen, 1996, 2000; Hubert et al., 2002). The second method is a new proposal, which combines the notion of outlyingness (Stahel, 1981; Donoho, 1982) with the FAST-MCD algorithm (Rousseeuw and Van Driessen, 1999). The performance and the robustness of these two methods are compared through a simulation study. We also illustrate the new method on a chemometrical data set.
引用
收藏
页码:169 / 179
页数:11
相关论文
共 50 条
  • [31] Robust and compact maximum margin clustering for high-dimensional data
    Cevikalp, Hakan
    Chome, Edward
    [J]. NEURAL COMPUTING & APPLICATIONS, 2024, 36 (11): : 5981 - 6003
  • [32] Feature-Robust Optimal Transport for High-Dimensional Data
    Petrovich, Mathis
    Liang, Chao
    Sato, Ryoma
    Liu, Yanbin
    Tsai, Yao-Hung Hubert
    Zhu, Linchao
    Yang, Yi
    Salakhutdinov, Ruslan
    Yamada, Makoto
    [J]. MACHINE LEARNING AND KNOWLEDGE DISCOVERY IN DATABASES, ECML PKDD 2022, PT V, 2023, 13717 : 291 - 307
  • [33] Robust support vector machine for high-dimensional imbalanced data
    Nakayama, Yugo
    [J]. COMMUNICATIONS IN STATISTICS-SIMULATION AND COMPUTATION, 2021, 50 (05) : 1524 - 1540
  • [34] A general family of trimmed estimators for robust high-dimensional data
    Yang, Eunho
    Lozano, Aurelie C.
    Aravkin, Aleksandr
    [J]. ELECTRONIC JOURNAL OF STATISTICS, 2018, 12 (02): : 3519 - 3553
  • [35] Robust statistical methods for high-dimensional data, with applications in tribology
    Pfeiffer, Pia
    Filzmoser, Peter
    [J]. ANALYTICA CHIMICA ACTA, 2023, 1279
  • [36] A Robust Supervised Variable Selection for Noisy High-Dimensional Data
    Kalina, Jan
    Schlenker, Anna
    [J]. BIOMED RESEARCH INTERNATIONAL, 2015, 2015
  • [37] Visualization Study of High-dimensional Data Classification Based on PCA-SVM
    Zhao Zhongwen
    Guo Huanghuang
    [J]. 2017 IEEE SECOND INTERNATIONAL CONFERENCE ON DATA SCIENCE IN CYBERSPACE (DSC), 2017, : 346 - 349
  • [38] High-dimensional data
    Geubbelmans, Melvin
    Rousseau, Axel-Jan
    Valkenborg, Dirk
    Burzykowski, Tomasz
    [J]. AMERICAN JOURNAL OF ORTHODONTICS AND DENTOFACIAL ORTHOPEDICS, 2023, 164 (03) : 453 - 456
  • [39] High-dimensional data
    Amaratunga, Dhammika
    Cabrera, Javier
    [J]. JOURNAL OF THE NATIONAL SCIENCE FOUNDATION OF SRI LANKA, 2016, 44 (01): : 3 - 9
  • [40] Robust estimation of the mean vector for high-dimensional data set using robust clustering
    Shahriari, Hamid
    Ahmadi, Orod
    [J]. JOURNAL OF APPLIED STATISTICS, 2015, 42 (06) : 1183 - 1205