Test for high-dimensional outliers with principal component analysis

被引:0
|
作者
Nakayama, Yugo [1 ]
Yata, Kazuyoshi [2 ]
Aoshima, Makoto [2 ]
机构
[1] Kyoto Univ, Grad Sch Informat, Yoshida Honmachi,Sakyo Ku, Kyoto, Kyoto 6068501, Japan
[2] Univ Tsukuba, Inst Math, 1-1-1 Tennodai, Tsukuba, Ibaraki 3058571, Japan
基金
日本学术振兴会;
关键词
Consistency; Grubbs test; HDLSS; Outlier detection; PC score; SAMPLE-SIZE DATA; EFFECTIVE PCA; CLASSIFICATION;
D O I
10.1007/s42081-024-00255-0
中图分类号
O21 [概率论与数理统计]; C8 [统计学];
学科分类号
020208 ; 070103 ; 0714 ;
摘要
We herein consider a test of outlier detection for high-dimensional, low-sample-size (HDLSS) data. Although outlier detection is a fundamental problem, it has not been extensively studied in the HDLSS setting. We derive asymptotic properties of the first principal component scores with outliers. We consider high-dimensional outlier detection by applying the asymptotic properties to the Grubbs test, a well-known method for testing outliers. Our results indicate that the test statistic provides preferable performance for both the size and power. Using this test procedure, we propose an algorithm to identify multiple outliers. We present an investigation of the theoretical properties of a sure independent screening and it can achieve complete identification of the outliers with high accuracy. Finally, we investigate the performance for both numerical studies and real data analyses as compared to available outlier detection methods in HDLSS settings. The proposed method exhibits superiority in terms of not only correctly detecting outliers, but also identifying a number of false identifications.
引用
收藏
页数:28
相关论文
共 50 条
  • [32] Principal component analysis for compositional data with outliers
    Filzmoser, Peter
    Hron, Karel
    Reimann, Clemens
    [J]. ENVIRONMETRICS, 2009, 20 (06) : 621 - 632
  • [33] MWPCR: Multiscale Weighted Principal Component Regression for High-Dimensional Prediction
    Zhu, Hongtu
    Shen, Dan
    Peng, Xuewei
    Liu, Leo Yufeng
    [J]. JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION, 2017, 112 (519) : 1009 - 1021
  • [34] High-Dimensional Principal Projections
    Mas, Andre
    Ruymgaart, Frits
    [J]. COMPLEX ANALYSIS AND OPERATOR THEORY, 2015, 9 (01) : 35 - 63
  • [35] High-Dimensional Principal Projections
    André Mas
    Frits Ruymgaart
    [J]. Complex Analysis and Operator Theory, 2015, 9 : 35 - 63
  • [36] Asymptotic distribution of the LR statistic for equality of the smallest eigenvalues in high-dimensional principal component analysis
    Fujikoshi, Yasunori
    Yamada, Takayuki
    Watanabe, Daisuke
    Sugiyama, Takakazu
    [J]. JOURNAL OF MULTIVARIATE ANALYSIS, 2007, 98 (10) : 2002 - 2008
  • [37] Constrained principal component analysis with stochastically ordered scores for high-dimensional mass spectrometry data
    Hyun, Hyeong Jin
    Kim, Youngrae
    Kim, Sun Jo
    Kim, Joungyeon
    Lim, Johan
    Lim, Dong Kyu
    Kwon, Sung Won
    [J]. CHEMOMETRICS AND INTELLIGENT LABORATORY SYSTEMS, 2021, 216
  • [38] CONSISTENCY OF AIC AND BIC IN ESTIMATING THE NUMBER OF SIGNIFICANT COMPONENTS IN HIGH-DIMENSIONAL PRINCIPAL COMPONENT ANALYSIS
    Bai, Zhidong
    Choi, Kwok Pui
    Fujikoshi, Yasunori
    [J]. ANNALS OF STATISTICS, 2018, 46 (03): : 1050 - 1076
  • [39] Tensor robust principal component analysis with total generalized variation for high-dimensional data recovery
    Xu, Zhi
    Yang, Jing-Hua
    Wang, Chuan-long
    Wang, Fusheng
    Yan, Xi-hong
    [J]. APPLIED MATHEMATICS AND COMPUTATION, 2024, 483
  • [40] Detecting and ranking outliers in high-dimensional data
    Kaur, Amardeep
    Datta, Amitava
    [J]. INTERNATIONAL JOURNAL OF ADVANCES IN ENGINEERING SCIENCES AND APPLIED MATHEMATICS, 2019, 11 (01) : 75 - 87