Detecting and ranking outliers in high-dimensional data

被引：0

作者：

Amardeep Kaur

Amitava Datta

机构：

[1] University of Western Australia,School of Computer Science and Software Engineering

来源：

International Journal of Advances in Engineering Sciences and Applied Mathematics | 2019年 / 11卷

关键词：

Data mining; Outlier detection; High-dimensional data;

D O I：

暂无

中图分类号：

学科分类号：

摘要：

Detecting outliers in high-dimensional data is a challenging problem. In high-dimensional data, outlying behaviour of data points can only be detected in the locally relevant subsets of data dimensions. The subsets of dimensions are called subspaces and the number of these subspaces grows exponentially with increase in data dimensionality. A data point which is an outlier in one subspace can appear normal in another subspace. In order to characterise an outlier, it is important to measure its outlying behaviour according to the number of subspaces in which it shows up as an outlier. These additional details can aid a data analyst to make important decisions about what to do with an outlier in terms of removing, fixing or keeping it unchanged in the dataset. In this paper, we propose an effective outlier detection algorithm for high-dimensional data which is based on a recent density-based clustering algorithm called SUBSCALE. We also provide ranking of outliers in terms of strength of their outlying behaviour. Our outlier detection and ranking algorithm does not make any assumptions about the underlying data distribution and can adapt according to different density parameter settings. We experimented with different datasets, and the top-ranked outliers were predicted with more than 82% precision as well as recall.

引用

页码：75 / 87

页数：12

共 50 条

[1] Detecting and ranking outliers in high-dimensional data
Kaur, Amardeep
Datta, Amitava
[J]. INTERNATIONAL JOURNAL OF ADVANCES IN ENGINEERING SCIENCES AND APPLIED MATHEMATICS, 2019, 11 (01) : 75 - 87
[2] Detecting Projected Outliers in High-Dimensional Data Streams
Zhang, Ji
Gao, Qigang
Wang, Hai
Liu, Qing
Xu, Kai
[J]. DATABASE AND EXPERT SYSTEMS APPLICATIONS, PROCEEDINGS, 2009, 5690 : 629 - +
[3] A kernel-based approach for detecting outliers of high-dimensional biological data
Jung Hun Oh
Jean Gao
[J]. BMC Bioinformatics, 10
[4] A kernel-based approach for detecting outliers of high-dimensional biological data
Oh, Jung Hun
Gao, Jean
[J]. BMC BIOINFORMATICS, 2009, 10
[5] SPOT: A system for detecting projected outliers from high-dimensional data streams
Zhang, Ji
Gao, Qigang
Wang, Hai
[J]. 2008 IEEE 24TH INTERNATIONAL CONFERENCE ON DATA ENGINEERING, VOLS 1-3, 2008, : 1628 - +
[6] Selective Value Coupling Learning for Detecting Outliers in High-Dimensional Categorical Data
Pang, Guansong
Xu, Hongzuo
Cao, Longbing
Zhao, Wentao
[J]. CIKM'17: PROCEEDINGS OF THE 2017 ACM CONFERENCE ON INFORMATION AND KNOWLEDGE MANAGEMENT, 2017, : 807 - 816
[7] OutRank:: ranking outliers in high dimensional data
Mueller, Emmanuel
Assent, Ira
Steinhausen, Uwe
Seidl, Thomas
[J]. 2008 IEEE 24TH INTERNATIONAL CONFERENCE ON DATA ENGINEERING WORKSHOP, VOLS 1 AND 2, 2008, : 259 - 262
[8] Hiding outliers in high-dimensional data spaces
Steinbuss G.
Böhm K.
[J]. International Journal of Data Science and Analytics, 2017, 4 (3) : 173 - 189
[9] Sparse PCA for High-Dimensional Data With Outliers
Hubert, Mia
Reynkens, Tom
Schmitt, Eric
Verdonck, Tim
[J]. TECHNOMETRICS, 2016, 58 (04) : 424 - 434
[10] Cluster PCA for outliers detection in high-dimensional data
Stefatos, George
Ben Hamza, A.
[J]. 2007 IEEE INTERNATIONAL CONFERENCE ON SYSTEMS, MAN AND CYBERNETICS, VOLS 1-8, 2007, : 3961 - 3966

← 1 2 3 4 5 →