Fast distributed outlier detection in mixed-attribute data sets

被引:116
|
作者
Otey, ME [1 ]
Ghoting, A [1 ]
Parthasarathy, S [1 ]
机构
[1] Ohio State Univ, Dept Comp Sci & Engn, Columbus, OH 43210 USA
基金
美国国家科学基金会;
关键词
outlier detection; anomaly detection; distributed data mining; mining dynamic data; mixedattribute data sets; data streams;
D O I
10.1007/s10618-005-0014-6
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Efficiently detecting outliers or anomalies is an important problem in many areas of science, medicine and information technology. Applications range from data cleaning to clinical diagnosis, from detecting anomalous defects in materials to fraud and intrusion detection. Over the past decade, researchers in data mining and statistics have addressed the problem of outlier detection using both parametric and non-parametric approaches in a centralized setting. However, there are still several challenges that must be addressed. First, most approaches to date have focused on detecting outliers in a continuous attribute space. However, almost all real-world data sets contain a mixture of categorical and continuous attributes. Categorical attributes are typically ignored or incorrectly modeled by existing approaches, resulting in a significant loss of information. Second, there have not been any general-purpose distributed outlier detection algorithms. Most distributed detection algorithms are designed with a specific domain (e.g. sensor networks) in mind. Third, the data sets being analyzed may be streaming or otherwise dynamic in nature. Such data sets are prone to concept drift, and models of the data must be dynamic as well. To address these challenges, we present a tunable algorithm for distributed outlier detection in dynamic mixed-attribute data sets.
引用
收藏
页码:203 / 228
页数:26
相关论文
共 50 条
  • [1] Fast Distributed Outlier Detection in Mixed-Attribute Data Sets
    Matthew Eric Otey
    Amol Ghoting
    Srinivasan Parthasarathy
    [J]. Data Mining and Knowledge Discovery, 2006, 12 : 203 - 228
  • [2] A practical outlier detection approach for mixed-attribute data
    Bouguessa, Mohamed
    [J]. EXPERT SYSTEMS WITH APPLICATIONS, 2015, 42 (22) : 8637 - 8649
  • [3] Hephaistos: A fast and distributed outlier detection approach for big mixed attribute data
    Du, Haizhou
    Fang, Wei
    Wang, Yi
    [J]. INTELLIGENT DATA ANALYSIS, 2019, 23 (04) : 759 - 778
  • [4] Detecting Network Anomalies in Mixed-Attribute Data Sets
    Tran, Khoi-Nguyen
    Jin, Huidong
    [J]. THIRD INTERNATIONAL CONFERENCE ON KNOWLEDGE DISCOVERY AND DATA MINING: WKDD 2010, PROCEEDINGS, 2010, : 383 - 386
  • [5] Missing Value Estimation for Mixed-Attribute Data Sets
    Zhu, Xiaofeng
    Zhang, Shichao
    Jin, Zhi
    Zhang, Zili
    Xu, Zhuoming
    [J]. IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2011, 23 (01) : 110 - 121
  • [6] A fast outlier detection strategy for distributed high-dimensional data sets with mixed attributes
    Koufakou, Anna
    Georgiopoulos, Michael
    [J]. DATA MINING AND KNOWLEDGE DISCOVERY, 2010, 20 (02) : 259 - 289
  • [7] A fast outlier detection strategy for distributed high-dimensional data sets with mixed attributes
    Anna Koufakou
    Michael Georgiopoulos
    [J]. Data Mining and Knowledge Discovery, 2010, 20 : 259 - 289
  • [8] Random Mixed Field Model for Mixed-Attribute Data Restoration
    Li, Qiang
    Bian, Wei
    Xu, Richard Yi Da
    You, Jane
    Tao, Dacheng
    [J]. THIRTIETH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, 2016, : 1244 - 1250
  • [9] Clustering Mixed-Attribute Data using Random Walk
    Skabar, Andrew
    [J]. INTERNATIONAL CONFERENCE ON COMPUTATIONAL SCIENCE (ICCS 2017), 2017, 108 : 988 - 997
  • [10] An Effective Pattern Based Outlier Detection Approach for Mixed Attribute Data
    Zhang, Ke
    Jin, Huidong
    [J]. AI 2010: ADVANCES IN ARTIFICIAL INTELLIGENCE, 2010, 6464 : 122 - 131