An iterative approach to unsupervised outlier detection using ensemble method and distance-based data filtering

被引:9
|
作者
Chakraborty, Bodhan [1 ]
Chaterjee, Agneet [2 ]
Malakar, Samir [3 ]
Sarkar, Ram [2 ]
机构
[1] Univ Calcutta, Inst Radiophys & Elect, Kolkata, India
[2] Jadavpur Univ, Dept Comp Sci & Engn, Kolkata, India
[3] Asutosh Coll, Dept Comp Sci, Kolkata, India
关键词
Outlier detection; Unsupervised learning; Iterative approach; Ensemble method; Distance-based filtering; Dunn index;
D O I
10.1007/s40747-022-00674-0
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Outlier or anomaly detection is the process through which datum/data with different properties from the rest of the data is/are identified. Their importance lies in their use in various domains such as fraud detection, network intrusion detection, and spam filtering. In this paper, we introduce a new outlier detection algorithm based on an ensemble method and distance-based data filtering with an iterative approach to detect outliers in unlabeled data. The ensemble method is used to cluster the unlabeled data and to filter out potential isolated outliers from the same by iteratively using a cluster membership threshold until the Dunn index score for clustering is maximized. The distance-based data filtering, on the other hand, removes the potential outlier clusters from the post-clustered data based on a distance threshold using the Euclidean distance measure of each data point from the majority cluster as the filtering factor. The performance of our algorithm is evaluated by applying it to 10 real-world machine learning datasets. Finally, we compare the results of our algorithm to various supervised and unsupervised outlier detection algorithms using Precision@n and F-score evaluation metrics.
引用
收藏
页码:3215 / 3230
页数:16
相关论文
共 50 条
  • [21] A Distance-Based Outlier Detection Using Particle Swarm Optimization Technique
    Wahid, Abdul
    Rao, Annavarapu Chandra Sekhara
    INFORMATION AND COMMUNICATION TECHNOLOGY FOR COMPETITIVE STRATEGIES, 2019, 40 : 633 - 643
  • [22] A Distance-based Outlier Detection Method for Rumor Detection Exploiting User Behaviorial Differences
    Zhang, Yan
    Chen, Weiling
    Yeo, Chai Kiat
    Lau, Chiew Tong
    Lee, Bu Sung
    PROCEEDINGS OF 2016 INTERNATIONAL CONFERENCE ON DATA AND SOFTWARE ENGINEERING (ICODSE), 2016,
  • [23] An Effective Minimal Probing Approach With Micro-Cluster for Distance-Based Outlier Detection in Data Streams
    Bah, Mohamed Jaward
    Wang, Hongzhi
    Hammad, Mohamed
    Zeshan, Furkh
    Aljuaid, Hanan
    IEEE ACCESS, 2019, 7 : 154922 - 154934
  • [24] Distance-based outlier detection for high dimension, low sample size data
    Ahn, Jeongyoun
    Lee, Myung Hee
    Lee, Jung Ae
    JOURNAL OF APPLIED STATISTICS, 2019, 46 (01) : 13 - 29
  • [25] An Efficient FPGA Implementation of Mahalanobis Distance-Based Outlier Detection for Streaming Data
    Arai, Yuto
    Wakabayashi, Shin'ichi
    Nagayama, Shinobu
    Inagi, Masato
    2016 INTERNATIONAL CONFERENCE ON FIELD-PROGRAMMABLE TECHNOLOGY (FPT), 2016, : 257 - 260
  • [26] Fast Distance-based Outlier Detection in Data Streams based on Micro-clusters
    Tran, Luan
    Fan, Liyue
    Shahabi, Cyrus
    SOICT 2019: PROCEEDINGS OF THE TENTH INTERNATIONAL SYMPOSIUM ON INFORMATION AND COMMUNICATION TECHNOLOGY, 2019, : 162 - 169
  • [27] Multi-Tactic Distance-based Outlier Detection
    Cao, Lei
    Yan, Yizhou
    Kuhlman, Caitlin
    Wang, Qingyang
    Rundensteiner, Elke A.
    Eltabakh, Mohamed
    2017 IEEE 33RD INTERNATIONAL CONFERENCE ON DATA ENGINEERING (ICDE 2017), 2017, : 959 - 970
  • [28] Distance-Based Outlier Detection: Consolidation and Renewed Bearing
    Orair, Gustavo H.
    Teixeira, Carlos H. C.
    Wang, Ye
    Parthasarathy, Srinivasan
    PROCEEDINGS OF THE VLDB ENDOWMENT, 2010, 3 (02): : 1469 - 1480
  • [29] Ensemble- and distance-based feature ranking for unsupervised learning
    Petkovic, Matej
    Kocev, Dragi
    Skrlj, Blaz
    Dzeroski, Saso
    INTERNATIONAL JOURNAL OF INTELLIGENT SYSTEMS, 2021, 36 (07) : 3068 - 3086
  • [30] Efficient Pruning Schemes for Distance-Based Outlier Detection
    Vu, Nguyen Hoang
    Gopalkrishnan, Vivekanand
    MACHINE LEARNING AND KNOWLEDGE DISCOVERY IN DATABASES, PT II, 2009, 5782 : 160 - 175