Density-preserving projections for large-scale local anomaly detection

被引:0
|
作者
Timothy de Vries
Sanjay Chawla
Michael E. Houle
机构
[1] University of Sydney,School of Information Technologies
[2] National Institute of Informatics,undefined
来源
关键词
Anomaly detection; Dimensionality reduction;
D O I
暂无
中图分类号
学科分类号
摘要
Outlier or anomaly detection is a fundamental data mining task with the aim to identify data points, events, transactions which deviate from the norm. The identification of outliers in data can provide insights about the underlying data generating process. In general, outliers can be of two kinds: global and local. Global outliers are distinct with respect to the whole data set, while local outliers are distinct with respect to data points in their local neighbourhood. While several approaches have been proposed to scale up the process of global outlier discovery in large databases, this has not been the case for local outliers. We tackle this problem by optimising the use of local outlier factor (LOF) for large and high-dimensional data. We propose projection-indexed nearest-neighbours (PINN), a novel technique that exploits extended nearest-neighbour sets in a reduced-dimensional space to create an accurate approximation for k-nearest-neighbour distances, which is used as the core density measurement within LOF. The reduced dimensionality allows for efficient sub-quadratic indexing in the number of items in the data set, where previously only quadratic performance was possible. A detailed theoretical analysis of random projection (RP) and PINN shows that we are able to preserve the density of the intrinsic manifold of the data set after projection. Experimental results show that PINN outperforms the standard projection methods RP and PCA when measuring LOF for many high-dimensional real-world data sets of up to 300,000 elements and 102,600 dimensions. A further investigation into the use of high-dimensionality-specific indexing such as spatial approximate sample hierarchy (SASH) shows that our novel technique holds benefits over even these types of highly efficient indexing. We cement the practical applications of our novel technique with insights into what it means to find local outliers in real data including image and text data, and include potential applications for this knowledge.
引用
收藏
页码:25 / 52
页数:27
相关论文
共 50 条
  • [1] Density-preserving projections for large-scale local anomaly detection
    de Vries, Timothy
    Chawla, Sanjay
    Houle, Michael E.
    KNOWLEDGE AND INFORMATION SYSTEMS, 2012, 32 (01) : 25 - 52
  • [2] Anomaly Detection in a Large-scale Cloud Platform
    Islam, Mohammad S.
    Pourmajidi, William
    Zhang, Lei
    Steinbacher, John
    Erwin, Tony
    Miranskyy, Andriy
    2021 IEEE/ACM 43RD INTERNATIONAL CONFERENCE ON SOFTWARE ENGINEERING: SOFTWARE ENGINEERING IN PRACTICE (ICSE-SEIP 2021), 2021, : 150 - 159
  • [3] Privacy preserving anomaly detection based on local density estimation
    Zhang, Chunkai
    Yin, Ao
    Zuo, Wei
    Chen, Yingyang
    MATHEMATICAL BIOSCIENCES AND ENGINEERING, 2020, 17 (04) : 3478 - 3497
  • [4] Multiscale Spatial Density Smoothing: An Application to Large-Scale Radiological Survey and Anomaly Detection
    Tansey, Wesley
    Athey, Alex
    Reinhart, Alex
    Scott, James G.
    JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION, 2017, 112 (519) : 1047 - 1063
  • [5] Crowdsourcing based large-scale network anomaly detection
    Li, Yang
    Huang, Wenguang
    Tian, Xiaohua
    2018 10TH INTERNATIONAL CONFERENCE ON WIRELESS COMMUNICATIONS AND SIGNAL PROCESSING (WCSP), 2018,
  • [6] Anomaly detection in large-scale data stream networks
    Duc-Son Pham
    Venkatesh, Svetha
    Lazarescu, Mihai
    Budhaditya, Saha
    DATA MINING AND KNOWLEDGE DISCOVERY, 2014, 28 (01) : 145 - 189
  • [7] Robust Anomaly Detection for Large-Scale Sensor Data
    Chakrabarti, Aniket
    Marwah, Manish
    Arlitt, Martin
    BUILDSYS'16: PROCEEDINGS OF THE 3RD ACM CONFERENCE ON SYSTEMS FOR ENERGY-EFFCIENT BUILT ENVIRONMENTS, 2016, : 31 - 40
  • [8] Spatiotemporal Anomaly Detection for Large-Scale Sensor Data
    Zhao, Minglu
    Takizawa, Hiroyuki
    Soma, Tomoya
    PAAP 2021: 2021 12TH INTERNATIONAL SYMPOSIUM ON PARALLEL ARCHITECTURES, ALGORITHMS AND PROGRAMMING, 2021, : 162 - 168
  • [9] Anomaly detection in large-scale data stream networks
    Duc-Son Pham
    Svetha Venkatesh
    Mihai Lazarescu
    Saha Budhaditya
    Data Mining and Knowledge Discovery, 2014, 28 : 145 - 189
  • [10] Robust and Rapid Clustering of KPIs for Large-Scale Anomaly Detection
    Li, Zhihan
    Zhao, Youjian
    Liu, Rong
    Pei, Dan
    2018 IEEE/ACM 26TH INTERNATIONAL SYMPOSIUM ON QUALITY OF SERVICE (IWQOS), 2018,