Detecting and ranking outliers in high-dimensional data

被引:0
|
作者
Amardeep Kaur
Amitava Datta
机构
[1] University of Western Australia,School of Computer Science and Software Engineering
关键词
Data mining; Outlier detection; High-dimensional data;
D O I
暂无
中图分类号
学科分类号
摘要
Detecting outliers in high-dimensional data is a challenging problem. In high-dimensional data, outlying behaviour of data points can only be detected in the locally relevant subsets of data dimensions. The subsets of dimensions are called subspaces and the number of these subspaces grows exponentially with increase in data dimensionality. A data point which is an outlier in one subspace can appear normal in another subspace. In order to characterise an outlier, it is important to measure its outlying behaviour according to the number of subspaces in which it shows up as an outlier. These additional details can aid a data analyst to make important decisions about what to do with an outlier in terms of removing, fixing or keeping it unchanged in the dataset. In this paper, we propose an effective outlier detection algorithm for high-dimensional data which is based on a recent density-based clustering algorithm called SUBSCALE. We also provide ranking of outliers in terms of strength of their outlying behaviour. Our outlier detection and ranking algorithm does not make any assumptions about the underlying data distribution and can adapt according to different density parameter settings. We experimented with different datasets, and the top-ranked outliers were predicted with more than 82% precision as well as recall.
引用
收藏
页码:75 / 87
页数:12
相关论文
共 50 条
  • [1] Detecting and ranking outliers in high-dimensional data
    Kaur, Amardeep
    Datta, Amitava
    [J]. INTERNATIONAL JOURNAL OF ADVANCES IN ENGINEERING SCIENCES AND APPLIED MATHEMATICS, 2019, 11 (01) : 75 - 87
  • [2] Detecting Projected Outliers in High-Dimensional Data Streams
    Zhang, Ji
    Gao, Qigang
    Wang, Hai
    Liu, Qing
    Xu, Kai
    [J]. DATABASE AND EXPERT SYSTEMS APPLICATIONS, PROCEEDINGS, 2009, 5690 : 629 - +
  • [3] A kernel-based approach for detecting outliers of high-dimensional biological data
    Jung Hun Oh
    Jean Gao
    [J]. BMC Bioinformatics, 10
  • [4] A kernel-based approach for detecting outliers of high-dimensional biological data
    Oh, Jung Hun
    Gao, Jean
    [J]. BMC BIOINFORMATICS, 2009, 10
  • [5] SPOT: A system for detecting projected outliers from high-dimensional data streams
    Zhang, Ji
    Gao, Qigang
    Wang, Hai
    [J]. 2008 IEEE 24TH INTERNATIONAL CONFERENCE ON DATA ENGINEERING, VOLS 1-3, 2008, : 1628 - +
  • [6] Selective Value Coupling Learning for Detecting Outliers in High-Dimensional Categorical Data
    Pang, Guansong
    Xu, Hongzuo
    Cao, Longbing
    Zhao, Wentao
    [J]. CIKM'17: PROCEEDINGS OF THE 2017 ACM CONFERENCE ON INFORMATION AND KNOWLEDGE MANAGEMENT, 2017, : 807 - 816
  • [7] OutRank:: ranking outliers in high dimensional data
    Mueller, Emmanuel
    Assent, Ira
    Steinhausen, Uwe
    Seidl, Thomas
    [J]. 2008 IEEE 24TH INTERNATIONAL CONFERENCE ON DATA ENGINEERING WORKSHOP, VOLS 1 AND 2, 2008, : 259 - 262
  • [8] Hiding outliers in high-dimensional data spaces
    Steinbuss G.
    Böhm K.
    [J]. International Journal of Data Science and Analytics, 2017, 4 (3) : 173 - 189
  • [9] Sparse PCA for High-Dimensional Data With Outliers
    Hubert, Mia
    Reynkens, Tom
    Schmitt, Eric
    Verdonck, Tim
    [J]. TECHNOMETRICS, 2016, 58 (04) : 424 - 434
  • [10] Cluster PCA for outliers detection in high-dimensional data
    Stefatos, George
    Ben Hamza, A.
    [J]. 2007 IEEE INTERNATIONAL CONFERENCE ON SYSTEMS, MAN AND CYBERNETICS, VOLS 1-8, 2007, : 3961 - 3966