A Compressed PCA Subspace Method for Anomaly Detection in High-Dimensional Data

被引:39
|
作者
Ding, Qi [1 ]
Kolaczyk, Eric D. [1 ]
机构
[1] Boston Univ, Dept Math & Stat, Boston, MA 02215 USA
基金
美国国家科学基金会;
关键词
Anomaly detection; principal component analysis; random projection; PRINCIPAL; MATRICES; NOISE;
D O I
10.1109/TIT.2013.2278017
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Random projection is widely used as a method of dimension reduction. In recent years, its combination with standard techniques of regression and classification has been explored. Here, we examine its use for anomaly detection in high-dimensional settings, in conjunction with principal component analysis (PCA) and corresponding subspace detection methods. We assume a so-called spiked covariance model for the underlying data generation process and a Gaussian random projection. We adopt a hypothesis testing perspective of the anomaly detection problem, with the test statistic defined to be the magnitude of the residuals of a PCA analysis. Under the null hypothesis of no anomaly, we characterize the relative accuracy with which the mean and variance of the test statistic from compressed data approximate those of the corresponding test statistic from uncompressed data. Furthermore, under a suitable alternative hypothesis, we provide expressions that allow for a comparison of statistical power for detection. Finally, whereas these results correspond to the ideal setting in which the data covariance is known, we show that it is possible to obtain the same order of accuracy when the covariance of the compressed measurements is estimated using a sample covariance, as long as the number of measurements is of the same order of magnitude as the reduced dimensionality. We illustrate the practical impact of our results in the context of predicting volume anomalies in Internet traffic data.
引用
收藏
页码:7419 / 7433
页数:15
相关论文
共 50 条
  • [1] Weighted subspace anomaly detection in high-dimensional space
    Tu, Jiankai
    Liu, Huan
    Li, Chunguang
    [J]. PATTERN RECOGNITION, 2024, 146
  • [2] Anomaly Detection in High-Dimensional Data
    Talagala, Priyanga Dilini
    Hyndman, Rob J.
    Smith-Miles, Kate
    [J]. JOURNAL OF COMPUTATIONAL AND GRAPHICAL STATISTICS, 2021, 30 (02) : 360 - 374
  • [3] Cluster PCA for outliers detection in high-dimensional data
    Stefatos, George
    Ben Hamza, A.
    [J]. 2007 IEEE INTERNATIONAL CONFERENCE ON SYSTEMS, MAN AND CYBERNETICS, VOLS 1-8, 2007, : 3961 - 3966
  • [4] An angle-based subspace anomaly detection approach to high-dimensional data: With an application to industrial fault detection
    Zhang, Liangwei
    Lin, Jing
    Karim, Ramin
    [J]. RELIABILITY ENGINEERING & SYSTEM SAFETY, 2015, 142 : 482 - 497
  • [5] Anomaly detection in mixed high-dimensional molecular data
    Buck, Lena
    Schmidt, Tobias
    Feist, Maren
    Schwarzfischer, Philipp
    Kube, Dieter
    Oefner, Peter J.
    Zacharias, Helena U.
    Altenbuchinger, Michael
    Dettmer, Katja
    Gronwald, Wolfram
    Spang, Rainer
    [J]. BIOINFORMATICS, 2023, 39 (08)
  • [6] High-Dimensional Matched Subspace Detection When Data are Missing
    Balzano, Laura
    Recht, Benjamin
    Nowak, Robert
    [J]. 2010 IEEE INTERNATIONAL SYMPOSIUM ON INFORMATION THEORY, 2010, : 1638 - 1642
  • [7] Robust PCA for high-dimensional data
    Hubert, M
    Rousseeuw, PJ
    Verboven, S
    [J]. DEVELOPMENTS IN ROBUST STATISTICS, 2003, : 169 - 179
  • [8] Anomaly Detection in High-Dimensional Data Based on Autoregressive Flow
    Yu, Yanwei
    Lv, Peng
    Tong, Xiangrong
    Dong, Junyu
    [J]. DATABASE SYSTEMS FOR ADVANCED APPLICATIONS (DASFAA 2020), PT II, 2020, 12113 : 125 - 140
  • [9] CASOS: a Subspace Method for Anomaly Detection in High Dimensional Astronomical Databases
    Henrion, Marc
    Hand, David J.
    Gandy, Axel
    Mortlock, Daniel J.
    [J]. STATISTICAL ANALYSIS AND DATA MINING, 2013, 6 (01) : 53 - 72
  • [10] Subspace rotations for high-dimensional outlier detection
    Chung, Hee Cheol
    Ahn, Jeongyoun
    [J]. JOURNAL OF MULTIVARIATE ANALYSIS, 2021, 183