Detecting and ranking outliers in high-dimensional data

被引:0
|
作者
Amardeep Kaur
Amitava Datta
机构
[1] University of Western Australia,School of Computer Science and Software Engineering
关键词
Data mining; Outlier detection; High-dimensional data;
D O I
暂无
中图分类号
学科分类号
摘要
Detecting outliers in high-dimensional data is a challenging problem. In high-dimensional data, outlying behaviour of data points can only be detected in the locally relevant subsets of data dimensions. The subsets of dimensions are called subspaces and the number of these subspaces grows exponentially with increase in data dimensionality. A data point which is an outlier in one subspace can appear normal in another subspace. In order to characterise an outlier, it is important to measure its outlying behaviour according to the number of subspaces in which it shows up as an outlier. These additional details can aid a data analyst to make important decisions about what to do with an outlier in terms of removing, fixing or keeping it unchanged in the dataset. In this paper, we propose an effective outlier detection algorithm for high-dimensional data which is based on a recent density-based clustering algorithm called SUBSCALE. We also provide ranking of outliers in terms of strength of their outlying behaviour. Our outlier detection and ranking algorithm does not make any assumptions about the underlying data distribution and can adapt according to different density parameter settings. We experimented with different datasets, and the top-ranked outliers were predicted with more than 82% precision as well as recall.
引用
收藏
页码:75 / 87
页数:12
相关论文
共 50 条
  • [31] Detecting high-dimensional determinism in time series with application to human movement data
    Ramdani, Sofiane
    Bouchara, Frederic
    Caron, Olivier
    [J]. NONLINEAR ANALYSIS-REAL WORLD APPLICATIONS, 2012, 13 (04) : 1891 - 1903
  • [32] Detecting outlying subspaces for high-dimensional data: the new task, algorithms, and performance
    Zhang, Ji
    Wang, Hai
    [J]. KNOWLEDGE AND INFORMATION SYSTEMS, 2006, 10 (03) : 333 - 355
  • [33] Detecting Trivariate Associations in High-Dimensional Datasets
    Liu, Chuanlu
    Wang, Shuliang
    Yuan, Hanning
    Dang, Yingxu
    Liu, Xiaojia
    [J]. SENSORS, 2022, 22 (07)
  • [34] Detecting determinism in high-dimensional chaotic systems
    Ortega, GJ
    Boschi, CDE
    Louis, E
    [J]. PHYSICAL REVIEW E, 2002, 65 (01):
  • [35] On Criticality in High-Dimensional Data
    Saremi, Saeed
    Sejnowski, Terrence J.
    [J]. NEURAL COMPUTATION, 2014, 26 (07) : 1329 - 1339
  • [36] High-dimensional data clustering
    Bouveyron, C.
    Girard, S.
    Schmid, C.
    [J]. COMPUTATIONAL STATISTICS & DATA ANALYSIS, 2007, 52 (01) : 502 - 519
  • [38] High-Dimensional Data Bootstrap
    Chernozhukov, Victor
    Chetverikov, Denis
    Kato, Kengo
    Koike, Yuta
    [J]. ANNUAL REVIEW OF STATISTICS AND ITS APPLICATION, 2023, 10 : 427 - 449
  • [39] High-dimensional data visualization
    Tang, Lin
    [J]. NATURE METHODS, 2020, 17 (02) : 129 - 129
  • [40] High-dimensional data visualization
    Lin Tang
    [J]. Nature Methods, 2020, 17 : 129 - 129