An efficient algorithm for distributed density-based outlier detection on big data

被引:55
|
作者
Bai, Mei [1 ]
Wang, Xite [1 ]
Xin, Junchang [1 ]
Wang, Guoren [1 ]
机构
[1] Northeastern Univ, Coll Informat Sci & Engn, Shenyang 110819, Liaoning, Peoples R China
基金
中国国家自然科学基金;
关键词
Density-based outlier; Local outlier factor; Distributed algorithm; MINING OUTLIERS; CUES;
D O I
10.1016/j.neucom.2015.05.135
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The outlier detection is a popular issue in the area of data management and multimedia analysis, and it can be used in many applications such as detection of noisy images, credit card fraud detection, network intrusion detection. The density-based outlier is an important definition of outlier, whose target is to compute a Local Outlier Factor (LOF) for each tuple in a data set to represent the degree of this tuple to be an outlier. It shows several significant advantages comparing with other existing definitions. This paper focuses on the problem of distributed density-based outlier detection for large-scale data. First, we propose a Gird-Based Partition algorithm (GBP) as a data preparation method. GBP first splits the data set into several grids, and then allocates these grids to the datanodes in a distributed environment. Second, we propose a Distributed LOF Computing method (DLC) for detecting density-based outliers in parallel, which only needs a small amount of network communications. At last, the efficiency and effectiveness of the proposed approaches are verified through a series of simulation experiments. (C) 2015 Elsevier B.V. All rights reserved.
引用
收藏
页码:19 / 28
页数:10
相关论文
共 50 条
  • [31] Density-based Outlier Detection in Multi-dimensional Datasets
    Wang, Xite
    Cao, Zhixin
    Zhan, Rongjuan
    Bai, Mei
    Ma, Qian
    Li, Guanyu
    [J]. KSII TRANSACTIONS ON INTERNET AND INFORMATION SYSTEMS, 2022, 16 (12): : 3815 - 3835
  • [32] Enhancing density-based clustering: Parameter reduction and outlier detection
    Cassisi, Carmelo
    Ferro, Alfredo
    Giugno, Rosalba
    Pigola, Giuseppe
    Pulvirenti, Alfredo
    [J]. INFORMATION SYSTEMS, 2013, 38 (03) : 317 - 330
  • [33] Boundary-aware local Density-based outlier detection
    Aydin, Fatih
    [J]. INFORMATION SCIENCES, 2023, 647
  • [34] Application of density-based outlier detection to database activity monitoring
    Seung Kim
    Nam Wook Cho
    Young Joo Lee
    Suk-Ho Kang
    Taewan Kim
    Hyeseon Hwang
    Dongseop Mun
    [J]. Information Systems Frontiers, 2013, 15 : 55 - 65
  • [35] Application of density-based outlier detection to database activity monitoring
    Kim, Seung
    Cho, Nam Wook
    Lee, Young Joo
    Kang, Suk-Ho
    Kim, Taewan
    Hwang, Hyeseon
    Mun, Dongseop
    [J]. INFORMATION SYSTEMS FRONTIERS, 2013, 15 (01) : 55 - 65
  • [36] Efficient density and cluster based incremental outlier detection in data streams
    Degirmenci, Ali
    Karal, Omer
    [J]. INFORMATION SCIENCES, 2022, 607 : 901 - 920
  • [37] EFFICIENT DENSITY-BASED PARTITIONAL CLUSTERING ALGORITHM
    Alamgir, Zareen
    Naveed, Hina
    [J]. COMPUTING AND INFORMATICS, 2021, 40 (06) : 1322 - 1344
  • [38] MIDBSCAN: An Efficient Density-Based Clustering Algorithm
    Tsai, Cheng-Fa
    Sung, Chun-Yi
    [J]. SIXTH INTERNATIONAL SYMPOSIUM ON NEURAL NETWORKS (ISNN 2009), 2009, 56 : 469 - 479
  • [39] A Density-Based Algorithm for the Detection of Individual Trees from LiDAR Data
    Latella, Melissa
    Sola, Fabio
    Camporeale, Carlo
    [J]. REMOTE SENSING, 2021, 13 (02) : 1 - 22
  • [40] An Efficient Distributed Algorithm for Big Data Processing
    Mohammed S. Al-kahtani
    Lutful Karim
    [J]. Arabian Journal for Science and Engineering, 2017, 42 : 3149 - 3157