Detection of local and clustered outliers based on the density-distance decision graph

被引:15
|
作者
Li, Kangsheng [1 ]
Gao, Xin [1 ]
Jia, Xin [1 ]
Xue, Bing [1 ]
Fu, Shiyuan [1 ]
Liu, Zhiyu [1 ]
Huang, Xu [1 ]
Huang, Zijian [1 ]
机构
[1] Beijing Univ Posts & Telecommun, Sch Artificial Intelligence, Beijing, Peoples R China
关键词
Outlier detection; Anomaly detection; Local reachable density; Kernel density estimation; Density lifting distance; Density-distance decision graph;
D O I
10.1016/j.engappai.2022.104719
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Outlier detection tasks refer to identifying the objects that have different characteristics from the normal observations. Most existing approaches detect outliers from the global perspective, which can effectively detect global outliers and most clustered outliers but cannot detect local outliers when the normal samples form clusters with different densities. The methods based on local outlier factors can effectively detect local outliers, but when the number of outliers increases, the more occurrences of clustered outliers will lead to the degeneration of the detection performance. We proposed an outlier detection method based on density-distance decision graph to detect local, global and clustered outliers simultaneously. Firstly, kernel density estimation and local reachable distance are combined to calculate the local density. The density ratio of the neighbors of an instance to itself is calculated as the degree of local outliers. Then, we propose a metric named density lifting distance as the degree of global outliers, which is calculated by the distance between k nearest neighbors with higher density of the instance and itself. The density ratio and density lift distance are combined to draw the density-distance decision graph, and the product of two metrics is calculated as the final outlier score. Comprehensive experiments were conducted on 8 synthetic datasets and 16 real-world datasets compared with 12 state-of-the-art methods. The results show that the proposed method works well when the samples form clusters with different densities as well as the percentage of outliers varies, and outperforms the state-of-the-art methods tested in terms of AUC.
引用
收藏
页数:15
相关论文
共 50 条
  • [1] Density-Distance Outlier Detection Algorithm Based on Natural Neighborhood
    Zhang, Jiaxuan
    Yang, Youlong
    [J]. AXIOMS, 2023, 12 (05)
  • [2] An automatic density peaks clustering based on a density-distance clustering index
    Xu, Xiao
    Liao, Hong
    Yang, Xu
    [J]. AIMS MATHEMATICS, 2023, 8 (12): : 28926 - 28950
  • [3] Clustering based on density-distance and t mixture model in flow cytometry data
    Zhao, Qijie
    Ke, Zhennan
    Tao, Jing
    Lu, Jianxia
    [J]. Yi Qi Yi Biao Xue Bao/Chinese Journal of Scientific Instrument, 2017, 38 (09): : 2130 - 2137
  • [4] An ensemble-based outlier detection method for clustered and local outliers with differential potential spread loss
    Gao, Xin
    Yu, Jiahao
    Zha, Sen
    Fu, Shiyuan
    Xue, Bing
    Ye, Ping
    Huang, Zijian
    Zhang, Guangyao
    [J]. KNOWLEDGE-BASED SYSTEMS, 2022, 258
  • [5] Distance-based detection and prediction of outliers
    Angiulli, F
    Basta, S
    Pizzuti, C
    [J]. IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2006, 18 (02) : 145 - 160
  • [6] HCDC: A novel hierarchical clustering algorithm based on density-distance cores for data sets with varying density
    Yang, Qi-Fen
    Gao, Wan-Yi
    Han, Gang
    Li, Zi-Yang
    Tian, Meng
    Zhu, Shu-Hua
    Deng, Yu-hui
    [J]. INFORMATION SYSTEMS, 2023, 114
  • [7] A novel temporal protein complexes identification framework based on density-distance and heuristic algorithm
    Xie, Dan
    Yi, Yang
    Zhou, Jin
    Li, Xiaodong
    Wu, Huikun
    [J]. NEURAL COMPUTING & APPLICATIONS, 2019, 31 (09): : 4693 - 4701
  • [8] An improved distance-based outliers detection algorithm
    Tian, Sheng-wen
    Huang, Ming-ming
    [J]. General System and Control System, Vol I, 2007, : 270 - 273
  • [9] An Improved Outlier Detection Algorithm Based on Distance Increment and Decision Graph Score
    Zhang, Yuanyuan
    He, Jinrong
    Liu, Zixi
    Cao, Yanan
    Shi, Aiqing
    [J]. 2019 CHINESE AUTOMATION CONGRESS (CAC2019), 2019, : 4186 - 4191
  • [10] Parallel algorithms for distance-based and density-based outliers
    Lozano, E
    Acuña, E
    [J]. Fifth IEEE International Conference on Data Mining, Proceedings, 2005, : 729 - 732