Density-preserving projections for large-scale local anomaly detection

被引:0
|
作者
Timothy de Vries
Sanjay Chawla
Michael E. Houle
机构
[1] University of Sydney,School of Information Technologies
[2] National Institute of Informatics,undefined
来源
关键词
Anomaly detection; Dimensionality reduction;
D O I
暂无
中图分类号
学科分类号
摘要
Outlier or anomaly detection is a fundamental data mining task with the aim to identify data points, events, transactions which deviate from the norm. The identification of outliers in data can provide insights about the underlying data generating process. In general, outliers can be of two kinds: global and local. Global outliers are distinct with respect to the whole data set, while local outliers are distinct with respect to data points in their local neighbourhood. While several approaches have been proposed to scale up the process of global outlier discovery in large databases, this has not been the case for local outliers. We tackle this problem by optimising the use of local outlier factor (LOF) for large and high-dimensional data. We propose projection-indexed nearest-neighbours (PINN), a novel technique that exploits extended nearest-neighbour sets in a reduced-dimensional space to create an accurate approximation for k-nearest-neighbour distances, which is used as the core density measurement within LOF. The reduced dimensionality allows for efficient sub-quadratic indexing in the number of items in the data set, where previously only quadratic performance was possible. A detailed theoretical analysis of random projection (RP) and PINN shows that we are able to preserve the density of the intrinsic manifold of the data set after projection. Experimental results show that PINN outperforms the standard projection methods RP and PCA when measuring LOF for many high-dimensional real-world data sets of up to 300,000 elements and 102,600 dimensions. A further investigation into the use of high-dimensionality-specific indexing such as spatial approximate sample hierarchy (SASH) shows that our novel technique holds benefits over even these types of highly efficient indexing. We cement the practical applications of our novel technique with insights into what it means to find local outliers in real data including image and text data, and include potential applications for this knowledge.
引用
收藏
页码:25 / 52
页数:27
相关论文
共 50 条
  • [41] A Hybrid Approach for Anomaly Detection on Large-scale Networks using HWDS and Entropy
    de Assis, Marcos V. O.
    Rodrigues, Joel J. P. C.
    Proenca, Mario Lemes, Jr.
    2013 21ST INTERNATIONAL CONFERENCE ON SOFTWARE, TELECOMMUNICATIONS AND COMPUTER NETWORKS (SOFTCOM 2013), 2013, : 295 - 299
  • [42] DILAF: A framework for distributed analysis of large-scale system logs for anomaly detection
    Astekin, Merve
    Zengin, Harun
    Sozer, Hasan
    SOFTWARE-PRACTICE & EXPERIENCE, 2019, 49 (02): : 153 - 170
  • [43] Anomaly Detection for Data Streams in Large-Scale Distributed Heterogeneous Computing Environments
    Dang, Yue
    Wang, Bin
    Brant, Ryan
    Zhang, Zhiping
    Alqallaf, Maha
    Wu, Zhiqiang
    PROCEEDINGS OF THE 12TH INTERNATIONAL CONFERENCE ON CYBER WARFARE AND SECURITY (ICCWS 2017), 2017, : 121 - 130
  • [44] SafeDrive: Online Driving Anomaly Detection From Large-Scale Vehicle Data
    Zhang, Mingming
    Chen, Chao
    Wo, Tianyu
    Xie, Tao
    Bhuiyan, Md Zakirul Alam
    Lin, Xuelian
    IEEE TRANSACTIONS ON INDUSTRIAL INFORMATICS, 2017, 13 (04) : 2087 - 2096
  • [45] A new online anomaly learning and detection for large-scale service of Internet of Thing
    JunPing Wang
    Qiuming Kuang
    ShiHui Duan
    Personal and Ubiquitous Computing, 2015, 19 : 1021 - 1031
  • [46] Subspace-Based Anomaly Detection for Large-Scale Campus Network Traffic
    Zhao, Xiaofeng
    Wu, Qiubing
    JOURNAL OF APPLIED MATHEMATICS, 2023, 2023
  • [47] Anomaly detection in large-scale networks: A state-space decision process
    Alghuried, Abdullah
    Moghaddass, Ramin
    JOURNAL OF QUALITY TECHNOLOGY, 2022, 54 (01) : 65 - 92
  • [48] Combined Multiclass Classification and Anomaly Detection for Large-Scale Wireless Sensor Networks
    Shilton, Alistair
    Rajasegarar, Sutharshan
    Palaniswami, Marimuthu
    2013 IEEE EIGHTH INTERNATIONAL CONFERENCE ON INTELLIGENT SENSORS, SENSOR NETWORKS AND INFORMATION PROCESSING, 2013, : 491 - 496
  • [49] ADSTREAM: Anomaly Detection in Large-Scale Data Streams Using Local Outlier Factor Based on Micro-Cluster
    Seo, Sanghyun
    Park, Seongchul
    Hwang, Injea
    Kim, Juntae
    ADVANCED SCIENCE LETTERS, 2017, 23 (10) : 10204 - 10209
  • [50] Execution anomaly detection in large-scale systems through console log analysis
    Bao, Liang
    Li, Qian
    Lu, Peiyao
    Lu, Jie
    Ruan, Tongxiao
    Zhang, Ke
    JOURNAL OF SYSTEMS AND SOFTWARE, 2018, 143 : 172 - 186