Density-preserving projections for large-scale local anomaly detection

被引:0
|
作者
Timothy de Vries
Sanjay Chawla
Michael E. Houle
机构
[1] University of Sydney,School of Information Technologies
[2] National Institute of Informatics,undefined
来源
关键词
Anomaly detection; Dimensionality reduction;
D O I
暂无
中图分类号
学科分类号
摘要
Outlier or anomaly detection is a fundamental data mining task with the aim to identify data points, events, transactions which deviate from the norm. The identification of outliers in data can provide insights about the underlying data generating process. In general, outliers can be of two kinds: global and local. Global outliers are distinct with respect to the whole data set, while local outliers are distinct with respect to data points in their local neighbourhood. While several approaches have been proposed to scale up the process of global outlier discovery in large databases, this has not been the case for local outliers. We tackle this problem by optimising the use of local outlier factor (LOF) for large and high-dimensional data. We propose projection-indexed nearest-neighbours (PINN), a novel technique that exploits extended nearest-neighbour sets in a reduced-dimensional space to create an accurate approximation for k-nearest-neighbour distances, which is used as the core density measurement within LOF. The reduced dimensionality allows for efficient sub-quadratic indexing in the number of items in the data set, where previously only quadratic performance was possible. A detailed theoretical analysis of random projection (RP) and PINN shows that we are able to preserve the density of the intrinsic manifold of the data set after projection. Experimental results show that PINN outperforms the standard projection methods RP and PCA when measuring LOF for many high-dimensional real-world data sets of up to 300,000 elements and 102,600 dimensions. A further investigation into the use of high-dimensionality-specific indexing such as spatial approximate sample hierarchy (SASH) shows that our novel technique holds benefits over even these types of highly efficient indexing. We cement the practical applications of our novel technique with insights into what it means to find local outliers in real data including image and text data, and include potential applications for this knowledge.
引用
收藏
页码:25 / 52
页数:27
相关论文
共 50 条
  • [21] Hierarchical Anomaly Detection and Multimodal Classification in Large-Scale Photovoltaic Systems
    Zhao, Yingying
    Liu, Qi
    Li, Dongsheng
    Kang, Dahai
    Lv, Qin
    Shang, Li
    IEEE TRANSACTIONS ON SUSTAINABLE ENERGY, 2019, 10 (03) : 1351 - 1361
  • [22] Expected similarity estimation for large-scale batch and streaming anomaly detection
    Markus Schneider
    Wolfgang Ertel
    Fabio Ramos
    Machine Learning, 2016, 105 : 305 - 333
  • [23] Large-Scale Traffic Anomaly Detection: Analysis of Real Netflow Datasets
    Spognardi, Angelo
    Villani, Antonio
    Vitali, Domenico
    Mancini, Luigi Vincenzo
    Battistoni, Roberto
    E-BUSINESS AND TELECOMMUNICATIONS, ICETE 2012, 2014, 455 : 192 - 208
  • [24] Probabilistic Temporal Fusion Transformers for Large-Scale KPI Anomaly Detection
    Luo, Haoran
    Zheng, Yongkun
    Chen, Kang
    Zhao, Shuo
    IEEE ACCESS, 2024, 12 : 9123 - 9137
  • [25] Efficient and Robust Trace Anomaly Detection for Large-Scale Microservice Systems
    Zhang, Shenglin
    Pan, Zhongjie
    Liu, Heng
    Jin, Pengxiang
    Sun, Yongqian
    Ouyang, Qianyu
    Wang, Jiaju
    Jia, Xueying
    Zhang, Yuzhi
    Yang, Hui
    Zou, Yongqiang
    Pei, Dan
    2023 IEEE 34TH INTERNATIONAL SYMPOSIUM ON SOFTWARE RELIABILITY ENGINEERING, ISSRE, 2023, : 69 - 79
  • [26] Collaborative anomaly-based detection of large-scale internet attacks
    Gamer, Thomas
    COMPUTER NETWORKS, 2012, 56 (01) : 169 - 185
  • [27] A Large-scale Replication of Smart Grids Power Consumption Anomaly Detection
    Rossi, Bruno
    PROCEEDINGS OF THE 5TH INTERNATIONAL CONFERENCE ON INTERNET OF THINGS, BIG DATA AND SECURITY (IOTBDS), 2020, : 288 - 295
  • [28] Adaptive Label Propagation for Group Anomaly Detection in Large-Scale Networks
    Li, Zhao
    Chen, Xia
    Song, Junshuai
    Gao, Jun
    IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2023, 35 (12) : 12053 - 12067
  • [29] Performance Anomaly and Change Point Detection For Large-Scale System Management
    Trubin, Igor
    ICPE'20: COMPANION OF THE ACM/SPEC INTERNATIONAL CONFERENCE ON PERFORMANCE ENGINEERING, 2020, : 7 - 7
  • [30] Higher-Order PCA for Anomaly Detection in Large-Scale Networks
    Kim, Hayang
    Lee, Sungeun
    Ma, Xiaoli
    Wang, Chao
    2009 3RD IEEE INTERNATIONAL WORKSHOP ON COMPUTATIONAL ADVANCES IN MULTI-SENSOR ADAPTIVE PROCESSING (CAMSAP 2009), 2009, : 85 - 88