An Efficient Density-Based Local Outlier Detection Approach for Scattered Data

被引:19
|
作者
Su, Shubin [2 ]
Xiao, Limin [1 ,2 ]
Ruan, Li [2 ]
Gu, Fei [2 ]
Li, Shupan [2 ]
Wang, Zhaokai [2 ]
Xu, Rongbin [2 ]
机构
[1] Beihang Univ, State Key Lab Software Dev Environm, Beijing 100191, Peoples R China
[2] Beihang Univ, Sch Comp Sci & Engn, Beijing 100191, Peoples R China
基金
中国国家自然科学基金;
关键词
Outlier detection; local outlier factor; neighborhood variance; rough clustering; scattered dataset; DISTANCE-BASED OUTLIERS; ALGORITHMS;
D O I
10.1109/ACCESS.2018.2886197
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
After the local outlier factor was first proposed, there is a large family of local outlier detection approaches derived from it. Since the existing approaches only focus on the extent of overall separation between an object and its neighbors, and ignore the degree of dispersion between them, the precision of these approaches will be affected by various degrees in the scattered datasets. In addition, the outlier data occupy a relatively small amount in the dataset, but the existing approaches need to perform local outlier factor calculation on all data during the outlier detection, which greatly reduces the efficiency of the algorithms. In this paper, we redefine a local outlier factor called local deviation coefficient (LDC) by taking full advantage of the distribution of the object and its neighbors. And then, we propose a safe non-outlier objects elimination approach named as rough clustering based on multi-level queries (RCMLQ) to preprocess the datasets to eliminate the non-outlier objects to the utmost. Finally, an efficient local outlier detection approach named as efficient density-based local outlier detection for scattered data (E2DLOS) is proposed based on the LDC and RCMLQ. The RCMLQ greatly reduces the amount of data that needs to be quantified for local outlier factor and the LDC is more sensitive to the degree of anomaly of the scattered datasets, and so the E2DLOS improves the existing local outlier detection approaches in time efficiency and detection accuracy. Experiments show that the LDC can better reflect the true abnormal situations of the data for the scattered datasets. And the RCMLQ can be used in parallel with the traditional methods of improving the efficiency of the nearest neighbor search, which can further improve the efficiency of the E2DLOS algorithm by about 16%.
引用
收藏
页码:1006 / 1020
页数:15
相关论文
共 50 条
  • [1] A local density-based approach for outlier detection
    Tang, Bo
    He, Haibo
    [J]. NEUROCOMPUTING, 2017, 241 : 171 - 180
  • [2] N2DLOF: A New Local Density-Based Outlier Detection Approach for Scattered Data
    Su, Shubin
    Xiao, Limin
    Zhang, Zhoujie
    Gu, Fei
    Ruan, Li
    Li, Shupan
    He, Zhenxue
    Huo, Zhisheng
    Yan, Baicheng
    Wang, Haitao
    Liu, Shaobo
    [J]. 2017 19TH IEEE INTERNATIONAL CONFERENCE ON HIGH PERFORMANCE COMPUTING AND COMMUNICATIONS (HPCC) / 2017 15TH IEEE INTERNATIONAL CONFERENCE ON SMART CITY (SMARTCITY) / 2017 3RD IEEE INTERNATIONAL CONFERENCE ON DATA SCIENCE AND SYSTEMS (DSS), 2017, : 458 - 465
  • [3] Density-Based Local Outlier Detection on Uncertain Data
    Cao, Keyan
    Shi, Lingxu
    Wang, Guoren
    Han, Donghong
    Bai, Mei
    [J]. WEB-AGE INFORMATION MANAGEMENT, WAIM 2014, 2014, 8485 : 67 - 71
  • [4] A local density-based outlier detection method for high dimension data
    Abdulghafoor, Shahad Adel
    Mohamed, Lekaa Ali
    [J]. INTERNATIONAL JOURNAL OF NONLINEAR ANALYSIS AND APPLICATIONS, 2022, 13 (01): : 1683 - 1699
  • [5] An efficient algorithm for distributed density-based outlier detection on big data
    Bai, Mei
    Wang, Xite
    Xin, Junchang
    Wang, Guoren
    [J]. NEUROCOMPUTING, 2016, 181 : 19 - 28
  • [6] Traffic Outlier Detection by Density-Based Bounded Local Outlier Factors
    Tang, Jialing
    Ngan, Henry Y. T.
    [J]. INFORMATION TECHNOLOGY IN INDUSTRY, 2016, 4 (01): : 6 - 18
  • [7] DWOF: A Robust Density-Based Outlier Detection Approach
    Momtaz, Rana
    Mohssen, Nesma
    Gowayyed, Mohammad A.
    [J]. PATTERN RECOGNITION AND IMAGE ANALYSIS, IBPRIA 2013, 2013, 7887 : 517 - 525
  • [8] Boundary-aware local Density-based outlier detection
    Aydin, Fatih
    [J]. INFORMATION SCIENCES, 2023, 647
  • [9] A Fast Randomized Method for Local Density-Based Outlier Detection in High Dimensional Data
    Minh Quoc Nguyen
    Omiecinski, Edward
    Mark, Leo
    Irani, Danesh
    [J]. DATA WAREHOUSING AND KNOWLEDGE DISCOVERY, 2010, 6263 : 215 - 226
  • [10] TADILOF: Time Aware Density-Based Incremental Local Outlier Detection in Data Streams
    Huang, Jen-Wei
    Zhong, Meng-Xun
    Jaysawal, Bijay Prasad
    [J]. SENSORS, 2020, 20 (20) : 1 - 25