A Review of Local Outlier Factor Algorithms for Outlier Detection in Big Data Streams

被引:107
|
作者
Alghushairy, Omar [1 ,2 ]
Alsini, Raed [1 ,3 ]
Soule, Terence [1 ]
Ma, Xiaogang [1 ]
机构
[1] Univ Idaho, Dept Comp Sci, Moscow, ID 83844 USA
[2] Univ Jeddah, Coll Comp Sci & Engn, Jeddah 23890, Saudi Arabia
[3] King Abdulaziz Univ, Fac Comp & Informat Technol, Jeddah 21589, Saudi Arabia
基金
美国国家科学基金会;
关键词
outlier detection; data science; local outlier factor; genetic algorithm; stream data mining; NOVELTY DETECTION; EFFICIENT; CLASSIFICATION;
D O I
10.3390/bdcc5010001
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Outlier detection is a statistical procedure that aims to find suspicious events or items that are different from the normal form of a dataset. It has drawn considerable interest in the field of data mining and machine learning. Outlier detection is important in many applications, including fraud detection in credit card transactions and network intrusion detection. There are two general types of outlier detection: global and local. Global outliers fall outside the normal range for an entire dataset, whereas local outliers may fall within the normal range for the entire dataset, but outside the normal range for the surrounding data points. This paper addresses local outlier detection. The best-known technique for local outlier detection is the Local Outlier Factor (LOF), a density-based technique. There are many LOF algorithms for a static data environment; however, these algorithms cannot be applied directly to data streams, which are an important type of big data. In general, local outlier detection algorithms for data streams are still deficient and better algorithms need to be developed that can effectively analyze the high velocity of data streams to detect local outliers. This paper presents a literature review of local outlier detection algorithms in static and stream environments, with an emphasis on LOF algorithms. It collects and categorizes existing local outlier detection algorithms and analyzes their characteristics. Furthermore, the paper discusses the advantages and limitations of those algorithms and proposes several promising directions for developing improved local outlier detection methods for data streams.
引用
收藏
页码:1 / 24
页数:24
相关论文
共 50 条
  • [21] Trajectory Outlier Detection on Trajectory Data Streams
    Cao, Keyan
    Liu, Yefan
    Meng, Gongjie
    Liu, Haoli
    Miao, Anchen
    Xu, Jingke
    [J]. IEEE ACCESS, 2020, 8 : 34187 - 34196
  • [22] Distributed Top-N Local Outlier Detection in Big Data
    Yan, Yizhou
    Cao, Lei
    Rundensteiner, Elke A.
    [J]. 2017 IEEE INTERNATIONAL CONFERENCE ON BIG DATA (BIG DATA), 2017, : 827 - 836
  • [24] Fast Memory Efficient Local Outlier Detection in Data Streams (Extended Abstract)
    Salehi, Mahsa
    Leckie, Christopher
    Bezdek, James C.
    Vaithianathan, Tharshan
    Zhang, Xuyun
    [J]. 2017 IEEE 33RD INTERNATIONAL CONFERENCE ON DATA ENGINEERING (ICDE 2017), 2017, : 51 - 52
  • [25] Outlier detection algorithms in data mining systems
    Petrovskiy, MI
    [J]. PROGRAMMING AND COMPUTER SOFTWARE, 2003, 29 (04) : 228 - 237
  • [26] Outlier Detection in Graph Streams
    Aggarwal, Charu C.
    Zhao, Yuchen
    Yu, Philip S.
    [J]. IEEE 27TH INTERNATIONAL CONFERENCE ON DATA ENGINEERING (ICDE 2011), 2011, : 399 - 409
  • [27] Outlier Detection Algorithms in Data Mining Systems
    M. I. Petrovskiy
    [J]. Programming and Computer Software, 2003, 29 : 228 - 237
  • [28] Distance-based Outlier Detection in Data Streams
    Tran, Luan
    Fan, Liyue
    Shahabi, Cyrus
    [J]. PROCEEDINGS OF THE VLDB ENDOWMENT, 2016, 9 (12): : 1089 - 1100
  • [29] Outlier Detection in Data Streams Using OLAP Cubes
    Heine, Felix
    [J]. NEW TRENDS IN DATABASES AND INFORMATION SYSTEMS, ADBIS 2017, 2017, 767 : 29 - 36
  • [30] A Hybrid Clustering Algorithm for Outlier Detection in Data Streams
    Vijayarani, S.
    Jothi, P.
    [J]. INTERNATIONAL JOURNAL OF GRID AND DISTRIBUTED COMPUTING, 2016, 9 (11): : 285 - 295