A geometric framework for outlier detection in high-dimensional data

被引:0
|
作者
Herrmann, Moritz [1 ]
Pfisterer, Florian [1 ]
Scheipl, Fabian [1 ]
机构
[1] Ludwig Maximilians Univ Munchen, Dept Stat, Ludwigstr 33, D-80539 Munich, Germany
关键词
anomaly detection; dimension reduction; manifold learning; outlier detection; REDUCTION;
D O I
10.1002/widm.1491
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Outlier or anomaly detection is an important task in data analysis. We discuss the problem from a geometrical perspective and provide a framework which exploits the metric structure of a data set. Our approach rests on the manifold assumption, that is, that the observed, nominally high-dimensional data lie on a much lower dimensional manifold and that this intrinsic structure can be inferred with manifold learning methods. We show that exploiting this structure significantly improves the detection of outlying observations in high dimensional data. We also suggest a novel, mathematically precise and widely applicable distinction between distributional and structural outliers based on the geometry and topology of the data manifold that clarifies conceptual ambiguities prevalent throughout the literature. Our experiments focus on functional data as one class of structured high-dimensional data, but the framework we propose is completely general and we include image and graph data applications. Our results show that the outlier structure of highdimensional and non-tabular data can be detected and visualized using manifold learning methods and quantified using standard outlier scoring methods applied to the manifold embedding vectors. This article is categorized under: Technologies > Structure Discovery and Clustering Fundamental Concepts of Data and Knowledge > Data Concepts Technologies > Visualization
引用
下载
收藏
页数:20
相关论文
共 50 条
  • [1] Outlier detection for high-dimensional data
    Ro, Kwangil
    Zou, Changliang
    Wang, Zhaojun
    Yin, Guosheng
    BIOMETRIKA, 2015, 102 (03) : 589 - 599
  • [2] Intrinsic dimensional outlier detection in high-dimensional data
    Von Brünken, Jonathan
    Houle, Michael E.
    Zimek, Arthur
    NII Technical Reports, 2015, (03): : 1 - 12
  • [3] Efficient Outlier Detection for High-Dimensional Data
    Liu, Huawen
    Li, Xuelong
    Li, Jiuyong
    Zhang, Shichao
    IEEE TRANSACTIONS ON SYSTEMS MAN CYBERNETICS-SYSTEMS, 2018, 48 (12): : 2451 - 2461
  • [4] A Comparison of Outlier Detection Techniques for High-Dimensional Data
    Xu, Xiaodan
    Liu, Huawen
    Li, Li
    Yao, Minghai
    INTERNATIONAL JOURNAL OF COMPUTATIONAL INTELLIGENCE SYSTEMS, 2018, 11 (01) : 652 - 662
  • [5] A Comparison of Outlier Detection Techniques for High-Dimensional Data
    Xiaodan Xu
    Huawen Liu
    Li Li
    Minghai Yao
    International Journal of Computational Intelligence Systems, 2018, 11 : 652 - 662
  • [6] Thresholding-based outlier detection for high-dimensional data
    Yang, Xiaona
    Wang, Zhaojun
    Zi, Xuemin
    JOURNAL OF STATISTICAL COMPUTATION AND SIMULATION, 2018, 88 (11) : 2170 - 2184
  • [7] Research on Outlier Detection for High-Dimensional Data Based on PPCLOF
    Chen, Chen
    Luo, Kaiwen
    Min, Lan
    Li, Shenglin
    JOURNAL OF WEB ENGINEERING, 2021, 20 (03): : 743 - 758
  • [8] ROBOUT: a conditional outlier detection methodology for high-dimensional data
    Farne, Matteo
    Vouldis, Angelos
    STATISTICAL PAPERS, 2024, 65 (04) : 2489 - 2525
  • [9] On eigenfunction approach to data mining: outlier detection in high-dimensional data sets
    Nagar, AK
    Muyeba, MK
    8TH WORLD MULTI-CONFERENCE ON SYSTEMICS, CYBERNETICS AND INFORMATICS, VOL II, PROCEEDINGS: COMPUTING TECHNIQUES, 2004, : 251 - 256
  • [10] A hybrid dimensionality reduction method for outlier detection in high-dimensional data
    Meng, Guanglei
    Wang, Biao
    Wu, Yanming
    Zhou, Mingzhe
    Meng, Tiankuo
    INTERNATIONAL JOURNAL OF MACHINE LEARNING AND CYBERNETICS, 2023, 14 (11) : 3705 - 3718