An iterative approach to unsupervised outlier detection using ensemble method and distance-based data filtering

被引:9
|
作者
Chakraborty, Bodhan [1 ]
Chaterjee, Agneet [2 ]
Malakar, Samir [3 ]
Sarkar, Ram [2 ]
机构
[1] Univ Calcutta, Inst Radiophys & Elect, Kolkata, India
[2] Jadavpur Univ, Dept Comp Sci & Engn, Kolkata, India
[3] Asutosh Coll, Dept Comp Sci, Kolkata, India
关键词
Outlier detection; Unsupervised learning; Iterative approach; Ensemble method; Distance-based filtering; Dunn index;
D O I
10.1007/s40747-022-00674-0
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Outlier or anomaly detection is the process through which datum/data with different properties from the rest of the data is/are identified. Their importance lies in their use in various domains such as fraud detection, network intrusion detection, and spam filtering. In this paper, we introduce a new outlier detection algorithm based on an ensemble method and distance-based data filtering with an iterative approach to detect outliers in unlabeled data. The ensemble method is used to cluster the unlabeled data and to filter out potential isolated outliers from the same by iteratively using a cluster membership threshold until the Dunn index score for clustering is maximized. The distance-based data filtering, on the other hand, removes the potential outlier clusters from the post-clustered data based on a distance threshold using the Euclidean distance measure of each data point from the majority cluster as the filtering factor. The performance of our algorithm is evaluated by applying it to 10 real-world machine learning datasets. Finally, we compare the results of our algorithm to various supervised and unsupervised outlier detection algorithms using Precision@n and F-score evaluation metrics.
引用
收藏
页码:3215 / 3230
页数:16
相关论文
共 50 条
  • [1] An iterative approach to unsupervised outlier detection using ensemble method and distance-based data filtering
    Bodhan Chakraborty
    Agneet Chaterjee
    Samir Malakar
    Ram Sarkar
    [J]. Complex & Intelligent Systems, 2022, 8 : 3215 - 3230
  • [2] Reverse Nearest Neighbors in Unsupervised Distance-Based Outlier Detection
    Radovanovic, Milos
    Nanopoulos, Alexandros
    Ivanovic, Mirjana
    [J]. IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2015, 27 (05) : 1369 - 1382
  • [3] Distance-based outlier detection on uncertain data
    Yu, Hao
    Wang, Bin
    Xiao, Gang
    Yang, Xiaochun
    [J]. Jisuanji Yanjiu yu Fazhan/Computer Research and Development, 2010, 47 (03): : 474 - 484
  • [4] Distance-based Outlier Detection in Data Streams
    Tran, Luan
    Fan, Liyue
    Shahabi, Cyrus
    [J]. PROCEEDINGS OF THE VLDB ENDOWMENT, 2016, 9 (12): : 1089 - 1100
  • [5] An Empirical Analysis of Hubness in Unsupervised Distance-Based Outlier Detection
    Flexer, Arthur
    [J]. 2016 IEEE 16TH INTERNATIONAL CONFERENCE ON DATA MINING WORKSHOPS (ICDMW), 2016, : 716 - 723
  • [6] A Distance-Based Trajectory Outlier Detection Method on Maritime Traffic Data
    Bao Lei
    Du Mingchao
    [J]. CONFERENCE PROCEEDINGS OF 2018 4TH INTERNATIONAL CONFERENCE ON CONTROL, AUTOMATION AND ROBOTICS (ICCAR), 2018, : 340 - 343
  • [7] A distance-based Outlier detection method using P-Tree
    Ren, DM
    Scott, K
    Wang, BY
    Perrizo, W
    [J]. COMPUTER APPLICATIONS IN INDUSTRY AND ENGINEERING, 2004, : 160 - 164
  • [8] Explainable Distance-Based Outlier Detection in Data Streams
    Toliopoulos, Theodoros
    Gounaris, Anastasios
    [J]. IEEE ACCESS, 2022, 10 : 47921 - 47936
  • [9] An Unbiased Distance-Based Outlier Detection Approach for High-Dimensional Data
    Hoang Vu Nguyen
    Gopalkrishnan, Vivekanand
    Assent, Ira
    [J]. DATABASE SYSTEMS FOR ADVANCED APPLICATIONS, PT I, 2011, 6587 : 138 - +
  • [10] Data Preprocessing for Distance-based Unsupervised Intrusion Detection
    Said, Dina
    Stirling, Lisa
    Federolf, Peter
    Barker, Ken
    [J]. 2011 NINTH ANNUAL INTERNATIONAL CONFERENCE ON PRIVACY, SECURITY AND TRUST, 2011, : 181 - 188