Fast distributed outlier detection in mixed-attribute data sets

被引:116
|
作者
Otey, ME [1 ]
Ghoting, A [1 ]
Parthasarathy, S [1 ]
机构
[1] Ohio State Univ, Dept Comp Sci & Engn, Columbus, OH 43210 USA
基金
美国国家科学基金会;
关键词
outlier detection; anomaly detection; distributed data mining; mining dynamic data; mixedattribute data sets; data streams;
D O I
10.1007/s10618-005-0014-6
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Efficiently detecting outliers or anomalies is an important problem in many areas of science, medicine and information technology. Applications range from data cleaning to clinical diagnosis, from detecting anomalous defects in materials to fraud and intrusion detection. Over the past decade, researchers in data mining and statistics have addressed the problem of outlier detection using both parametric and non-parametric approaches in a centralized setting. However, there are still several challenges that must be addressed. First, most approaches to date have focused on detecting outliers in a continuous attribute space. However, almost all real-world data sets contain a mixture of categorical and continuous attributes. Categorical attributes are typically ignored or incorrectly modeled by existing approaches, resulting in a significant loss of information. Second, there have not been any general-purpose distributed outlier detection algorithms. Most distributed detection algorithms are designed with a specific domain (e.g. sensor networks) in mind. Third, the data sets being analyzed may be streaming or otherwise dynamic in nature. Such data sets are prone to concept drift, and models of the data must be dynamic as well. To address these challenges, we present a tunable algorithm for distributed outlier detection in dynamic mixed-attribute data sets.
引用
收藏
页码:203 / 228
页数:26
相关论文
共 50 条
  • [31] Fast outlier detection for very large log data
    Kim, Seung
    Cho, Nam Wook
    Kang, Bokyoung
    Kang, Suk-Ho
    [J]. EXPERT SYSTEMS WITH APPLICATIONS, 2011, 38 (08) : 9587 - 9596
  • [32] A Fast and Efficient Local Outlier Detection in Data Streams
    Yang, Xing
    Zhou, Wenli
    Shu, Nanfei
    Zhang, Hao
    [J]. PROCEEDINGS OF 2019 INTERNATIONAL CONFERENCE ON IMAGE, VIDEO AND SIGNAL PROCESSING (IVSP 2019), 2019, : 111 - 116
  • [33] Multigranulation Relative Entropy-Based Mixed Attribute Outlier Detection in Neighborhood Systems
    Yuan, Zhong
    Chen, Hongmei
    Li, Tianrui
    Zhang, Xianyong
    Sang, Binbin
    [J]. IEEE TRANSACTIONS ON SYSTEMS MAN CYBERNETICS-SYSTEMS, 2022, 52 (08): : 5175 - 5187
  • [34] A Mixed-Attribute Approach in Ant-Miner Classification Rule Discovery Algorithm
    Helal, Ayah
    Otero, Fernando E. B.
    [J]. GECCO'16: PROCEEDINGS OF THE 2016 GENETIC AND EVOLUTIONARY COMPUTATION CONFERENCE, 2016, : 13 - 20
  • [35] RANDOM VECTOR GENERATION FROM MIXED-ATTRIBUTE DATASETS USING RANDOM WALK
    Skabar, Andrew
    [J]. 2016 WINTER SIMULATION CONFERENCE (WSC), 2016, : 1096 - 1107
  • [36] Attribute granules-based object entropy for outlier detection in nominal data
    Liu, Chang
    Peng, Dezhong
    Chen, Hongmei
    Yuan, Zhong
    [J]. ENGINEERING APPLICATIONS OF ARTIFICIAL INTELLIGENCE, 2024, 133
  • [37] An Adaptive Clustering Approach for Distributed Outlier Detection in Data Streams
    Della Monaca, Andrea
    Cafaro, Massimo
    Pulimeno, Marco
    Epicoco, Italo
    [J]. 19TH INTERNATIONAL SYMPOSIUM ON DISTRIBUTED COMPUTING AND ARTIFICIAL INTELLIGENCE, 2023, 583 : 86 - 99
  • [38] Fast Memory Efficient Local Outlier Detection in Data Streams
    Salehi, Mahsa
    Leckie, Christopher
    Bezdek, James C.
    Vaithianathan, Tharshan
    Zhang, Xuyun
    [J]. IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2016, 28 (12) : 3246 - 3260
  • [39] A Fast and Efficient Algorithm for Outlier Detection Over Data Streams
    Hassaan, Mosab
    Maher, Hend
    Gouda, Karam
    [J]. INTERNATIONAL JOURNAL OF ADVANCED COMPUTER SCIENCE AND APPLICATIONS, 2021, 12 (11) : 749 - 756
  • [40] Outlier Detection Forest for Large-Scale Categorical Data Sets
    Sun, Zhipeng
    Du, Hongwei
    Ye, Qiang
    Liu, Chuang
    Kibenge, Patricia Lilian
    Huang, Hui
    Li, Yuying
    [J]. COMPUTATIONAL DATA AND SOCIAL NETWORKS, 2019, 11917 : 45 - 56