Distance-Based Outlier Detection: Consolidation and Renewed Bearing

被引:66
|
作者
Orair, Gustavo H. [1 ]
Teixeira, Carlos H. C. [1 ]
Wang, Ye [2 ]
Parthasarathy, Srinivasan [2 ]
机构
[1] Univ Fed Minas Gerais, Dept Comp Sci, Belo Horizonte, MG, Brazil
[2] Ohio State Univ, Dept Comp Sci & Engn, Columbus, OH 43210 USA
来源
PROCEEDINGS OF THE VLDB ENDOWMENT | 2010年 / 3卷 / 02期
关键词
D O I
10.14778/1920841.1921021
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Detecting outliers in data is an important problem with interesting applications in a myriad of domains ranging from data cleaning to financial fraud detection and from network intrusion detection to clinical diagnosis of diseases. Over the last decade of research, distance-based outlier detection algorithms have emerged as a viable, scalable, parameter-free alternative to the more traditional statistical approaches. In this paper we assess several distance-based outlier detection approaches and evaluate them. We begin by surveying and examining the design landscape of extant approaches, while identifying key design decisions of such approaches. We then implement an outlier detection framework and conduct a factorial design experiment to understand the pros and cons of various optimizations proposed by us as well as those proposed in the literature, both independently and in conjunction with one another, on a diverse set of real-life datasets. To the best of our knowledge this is the first such study in the literature. The outcome of this study is a family of state of the art distance-based outlier detection algorithms. Our detailed empirical study supports the following observations. The combination of optimization strategies enables significant efficiency gains. Our factorial design study highlights the important fact that no single optimization or combination of optimizations (factors) always dominates on all types of data. Our study also allows us to characterize when a certain combination of optimizations is likely to prevail and helps provide interesting and useful insights for moving forward in this domain.
引用
收藏
页码:1469 / 1480
页数:12
相关论文
共 50 条
  • [1] Distance-based Outlier Detection in Data Streams
    Tran, Luan
    Fan, Liyue
    Shahabi, Cyrus
    [J]. PROCEEDINGS OF THE VLDB ENDOWMENT, 2016, 9 (12): : 1089 - 1100
  • [2] GPU Strategies for Distance-Based Outlier Detection
    Angiulli, Fabrizio
    Basta, Stefano
    Lodi, Stefano
    Sartori, Claudio
    [J]. IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, 2016, 27 (11) : 3256 - 3268
  • [3] Multi-Tactic Distance-based Outlier Detection
    Cao, Lei
    Yan, Yizhou
    Kuhlman, Caitlin
    Wang, Qingyang
    Rundensteiner, Elke A.
    Eltabakh, Mohamed
    [J]. 2017 IEEE 33RD INTERNATIONAL CONFERENCE ON DATA ENGINEERING (ICDE 2017), 2017, : 959 - 970
  • [4] Explainable Distance-Based Outlier Detection in Data Streams
    Toliopoulos, Theodoros
    Gounaris, Anastasios
    [J]. IEEE ACCESS, 2022, 10 : 47921 - 47936
  • [5] Efficient Pruning Schemes for Distance-Based Outlier Detection
    Vu, Nguyen Hoang
    Gopalkrishnan, Vivekanand
    [J]. MACHINE LEARNING AND KNOWLEDGE DISCOVERY IN DATABASES, PT II, 2009, 5782 : 160 - 175
  • [6] Adaptivity in continuous massively parallel distance-based outlier detection
    Theodoros Toliopoulos
    Anastasios Gounaris
    [J]. Computing, 2022, 104 : 2659 - 2684
  • [7] Reverse Nearest Neighbors in Unsupervised Distance-Based Outlier Detection
    Radovanovic, Milos
    Nanopoulos, Alexandros
    Ivanovic, Mirjana
    [J]. IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2015, 27 (05) : 1369 - 1382
  • [8] A distance-based method for outlier detection on high dimensional datasets
    Carmona, J.
    Lopez, I
    Mateo, J.
    Jimenez, L.
    Aldana, E.
    [J]. IEEE LATIN AMERICA TRANSACTIONS, 2020, 18 (03) : 589 - 597
  • [9] An Empirical Analysis of Hubness in Unsupervised Distance-Based Outlier Detection
    Flexer, Arthur
    [J]. 2016 IEEE 16TH INTERNATIONAL CONFERENCE ON DATA MINING WORKSHOPS (ICDMW), 2016, : 716 - 723
  • [10] Adaptivity in continuous massively parallel distance-based outlier detection
    Toliopoulos, Theodoros
    Gounaris, Anastasios
    [J]. COMPUTING, 2022, 104 (12) : 2659 - 2684