Density-preserving projections for large-scale local anomaly detection

被引:0
|
作者
Timothy de Vries
Sanjay Chawla
Michael E. Houle
机构
[1] University of Sydney,School of Information Technologies
[2] National Institute of Informatics,undefined
来源
关键词
Anomaly detection; Dimensionality reduction;
D O I
暂无
中图分类号
学科分类号
摘要
Outlier or anomaly detection is a fundamental data mining task with the aim to identify data points, events, transactions which deviate from the norm. The identification of outliers in data can provide insights about the underlying data generating process. In general, outliers can be of two kinds: global and local. Global outliers are distinct with respect to the whole data set, while local outliers are distinct with respect to data points in their local neighbourhood. While several approaches have been proposed to scale up the process of global outlier discovery in large databases, this has not been the case for local outliers. We tackle this problem by optimising the use of local outlier factor (LOF) for large and high-dimensional data. We propose projection-indexed nearest-neighbours (PINN), a novel technique that exploits extended nearest-neighbour sets in a reduced-dimensional space to create an accurate approximation for k-nearest-neighbour distances, which is used as the core density measurement within LOF. The reduced dimensionality allows for efficient sub-quadratic indexing in the number of items in the data set, where previously only quadratic performance was possible. A detailed theoretical analysis of random projection (RP) and PINN shows that we are able to preserve the density of the intrinsic manifold of the data set after projection. Experimental results show that PINN outperforms the standard projection methods RP and PCA when measuring LOF for many high-dimensional real-world data sets of up to 300,000 elements and 102,600 dimensions. A further investigation into the use of high-dimensionality-specific indexing such as spatial approximate sample hierarchy (SASH) shows that our novel technique holds benefits over even these types of highly efficient indexing. We cement the practical applications of our novel technique with insights into what it means to find local outliers in real data including image and text data, and include potential applications for this knowledge.
引用
收藏
页码:25 / 52
页数:27
相关论文
共 50 条
  • [31] Using Large-Scale Anomaly Detection on Code to Improve Kotlin Compiler
    Bryksin, Timofey
    Petukhov, Victor
    Alexin, Ilya
    Prikhodko, Stanislav
    Shpilman, Alexey
    Kovalenko, Vladimir
    2020 IEEE/ACM 17TH INTERNATIONAL CONFERENCE ON MINING SOFTWARE REPOSITORIES, MSR, 2020, : 455 - 465
  • [32] Generative and Autoencoder Models for Large-Scale Mutivariate Unsupervised Anomaly Detection
    Ounassera, Nabila
    Rhanoui, Maryem
    Mikram, Mounia
    El Asri, Bouchra
    NETWORKING, INTELLIGENT SYSTEMS AND SECURITY, 2022, 237 : 45 - 58
  • [33] A deep learning approach for anomaly detection in large-scale Hajj crowds
    Aldayri, Amnah
    Albattah, Waleed
    VISUAL COMPUTER, 2024, 40 (08): : 5589 - 5603
  • [34] Privatized Distributed Anomaly Detection for Large-Scale Nonlinear Uncertain Systems
    Rostampour, Vahab
    Ferrari, Riccardo M. G.
    Teixeira, Andre M. H.
    Keviczky, Tamas
    IEEE TRANSACTIONS ON AUTOMATIC CONTROL, 2021, 66 (11) : 5299 - 5313
  • [35] Anomaly Localization in Large-Scale Clusters
    Zheng, Ziming
    Li, Yawei
    Lan, Zhiling
    2007 IEEE INTERNATIONAL CONFERENCE ON CLUSTER COMPUTING, 2007, : 322 - 330
  • [36] Detection of Anthropogenic Zones by Means of Spatial Anomaly Detection in Large-Scale Satellite Images
    Borzov, S. M.
    Potaturkin, O. I.
    OPTOELECTRONICS INSTRUMENTATION AND DATA PROCESSING, 2012, 48 (05) : 515 - 521
  • [37] Detection of anthropogenic zones by means of spatial anomaly detection in large-scale satellite images
    S. M. Borzov
    O. I. Potaturkin
    Optoelectronics, Instrumentation and Data Processing, 2012, 48 (5) : 515 - 521
  • [38] Random Projections for Large-Scale Speaker Search
    Leary, Ryan
    Andrews, Walter
    15TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2014), VOLS 1-4, 2014, : 66 - 70
  • [39] Context-aware, Composable Anomaly Detection in Large-scale Mobile Networks
    Nguyen Ngoc Nhu Trang
    Hong-Linh Truong
    2023 IEEE 47TH ANNUAL COMPUTERS, SOFTWARE, AND APPLICATIONS CONFERENCE, COMPSAC, 2023, : 183 - 192
  • [40] Robust KPI Anomaly Detection for Large-Scale Software Services with Partial Labels
    Zhang, Shenglin
    Zhao, Chenyu
    Sui, Yicheng
    Su, Ya
    Sun, Yongqian
    Zhang, Yuzhi
    Pei, Dan
    Wang, Yizhe
    2021 IEEE 32ND INTERNATIONAL SYMPOSIUM ON SOFTWARE RELIABILITY ENGINEERING (ISSRE 2021), 2021, : 103 - 114