Test for high-dimensional outliers with principal component analysis

被引:0
|
作者
Nakayama, Yugo [1 ]
Yata, Kazuyoshi [2 ]
Aoshima, Makoto [2 ]
机构
[1] Kyoto Univ, Grad Sch Informat, Yoshida Honmachi,Sakyo Ku, Kyoto, Kyoto 6068501, Japan
[2] Univ Tsukuba, Inst Math, 1-1-1 Tennodai, Tsukuba, Ibaraki 3058571, Japan
基金
日本学术振兴会;
关键词
Consistency; Grubbs test; HDLSS; Outlier detection; PC score; SAMPLE-SIZE DATA; EFFECTIVE PCA; CLASSIFICATION;
D O I
10.1007/s42081-024-00255-0
中图分类号
O21 [概率论与数理统计]; C8 [统计学];
学科分类号
020208 ; 070103 ; 0714 ;
摘要
We herein consider a test of outlier detection for high-dimensional, low-sample-size (HDLSS) data. Although outlier detection is a fundamental problem, it has not been extensively studied in the HDLSS setting. We derive asymptotic properties of the first principal component scores with outliers. We consider high-dimensional outlier detection by applying the asymptotic properties to the Grubbs test, a well-known method for testing outliers. Our results indicate that the test statistic provides preferable performance for both the size and power. Using this test procedure, we propose an algorithm to identify multiple outliers. We present an investigation of the theoretical properties of a sure independent screening and it can achieve complete identification of the outliers with high accuracy. Finally, we investigate the performance for both numerical studies and real data analyses as compared to available outlier detection methods in HDLSS settings. The proposed method exhibits superiority in terms of not only correctly detecting outliers, but also identifying a number of false identifications.
引用
收藏
页数:28
相关论文
共 50 条
  • [1] On principal component analysis for high-dimensional XCSR
    Behdad, Mohammad
    French, Tim
    Barone, Luigi
    Bennamoun, Mohammed
    [J]. EVOLUTIONARY INTELLIGENCE, 2012, 5 (02) : 129 - 138
  • [2] On principal component analysis for high-dimensional XCSR
    Mohammad Behdad
    Tim French
    Luigi Barone
    Mohammed Bennamoun
    [J]. Evolutionary Intelligence, 2012, 5 (2) : 129 - 138
  • [3] Principal component analysis for sparse high-dimensional data
    Raiko, Tapani
    Ilin, Alexander
    Karhunen, Juha
    [J]. NEURAL INFORMATION PROCESSING, PART I, 2008, 4984 : 566 - 575
  • [4] High-dimensional principal component analysis with heterogeneous missingness
    Zhu, Ziwei
    Wang, Tengyao
    Samworth, Richard J.
    [J]. JOURNAL OF THE ROYAL STATISTICAL SOCIETY SERIES B-STATISTICAL METHODOLOGY, 2022, 84 (05) : 2000 - 2031
  • [5] PRINCIPAL COMPONENT ANALYSIS IN VERY HIGH-DIMENSIONAL SPACES
    Lee, Young Kyung
    Lee, Eun Ryung
    Park, Byeong U.
    [J]. STATISTICA SINICA, 2012, 22 (03) : 933 - 956
  • [6] Forecasting High-Dimensional Covariance Matrices Using High-Dimensional Principal Component Analysis
    Shigemoto, Hideto
    Morimoto, Takayuki
    [J]. AXIOMS, 2022, 11 (12)
  • [7] High-dimensional robust principal component analysis and its applications
    Jiang, Xiaobo
    Gao, Jie
    Yang, Zhongming
    [J]. JOURNAL OF COMPUTATIONAL METHODS IN SCIENCES AND ENGINEERING, 2023, 23 (05) : 2303 - 2311
  • [8] Multilevel Functional Principal Component Analysis for High-Dimensional Data
    Zipunnikov, Vadim
    Caffo, Brian
    Yousem, David M.
    Davatzikos, Christos
    Schwartz, Brian S.
    Crainiceanu, Ciprian
    [J]. JOURNAL OF COMPUTATIONAL AND GRAPHICAL STATISTICS, 2011, 20 (04) : 852 - 873
  • [9] Sparse principal component based high-dimensional mediation analysis
    Zhao, Yi
    Lindquist, Martin A.
    Caffo, Brian S.
    [J]. COMPUTATIONAL STATISTICS & DATA ANALYSIS, 2020, 142
  • [10] Sparse principal component analysis for high-dimensional stationary time series
    Fujimori, Kou
    Goto, Yuichi
    Liu, Yan
    Taniguchi, Masanobu
    [J]. SCANDINAVIAN JOURNAL OF STATISTICS, 2023, 50 (04) : 1953 - 1983