Fast distributed outlier detection in mixed-attribute data sets

被引：116

作者：

Otey, ME ^{[1
]}

Ghoting, A ^{[1
]}

Parthasarathy, S ^{[1
]}

机构：

[1] Ohio State Univ, Dept Comp Sci & Engn, Columbus, OH 43210 USA

来源：

DATA MINING AND KNOWLEDGE DISCOVERY | 2006年 / 12卷 / 2-3期

基金：

美国国家科学基金会;

关键词：

outlier detection; anomaly detection; distributed data mining; mining dynamic data; mixedattribute data sets; data streams;

D O I：

10.1007/s10618-005-0014-6

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Efficiently detecting outliers or anomalies is an important problem in many areas of science, medicine and information technology. Applications range from data cleaning to clinical diagnosis, from detecting anomalous defects in materials to fraud and intrusion detection. Over the past decade, researchers in data mining and statistics have addressed the problem of outlier detection using both parametric and non-parametric approaches in a centralized setting. However, there are still several challenges that must be addressed. First, most approaches to date have focused on detecting outliers in a continuous attribute space. However, almost all real-world data sets contain a mixture of categorical and continuous attributes. Categorical attributes are typically ignored or incorrectly modeled by existing approaches, resulting in a significant loss of information. Second, there have not been any general-purpose distributed outlier detection algorithms. Most distributed detection algorithms are designed with a specific domain (e.g. sensor networks) in mind. Third, the data sets being analyzed may be streaming or otherwise dynamic in nature. Such data sets are prone to concept drift, and models of the data must be dynamic as well. To address these challenges, we present a tunable algorithm for distributed outlier detection in dynamic mixed-attribute data sets.

引用

页码：203 / 228

页数：26

共 50 条

[1] Fast Distributed Outlier Detection in Mixed-Attribute Data Sets
Matthew Eric Otey
Amol Ghoting
Srinivasan Parthasarathy
[J]. Data Mining and Knowledge Discovery, 2006, 12 : 203 - 228
[2] A practical outlier detection approach for mixed-attribute data
Bouguessa, Mohamed
[J]. EXPERT SYSTEMS WITH APPLICATIONS, 2015, 42 (22) : 8637 - 8649
[3] Hephaistos: A fast and distributed outlier detection approach for big mixed attribute data
Du, Haizhou
Fang, Wei
Wang, Yi
[J]. INTELLIGENT DATA ANALYSIS, 2019, 23 (04) : 759 - 778
[4] Detecting Network Anomalies in Mixed-Attribute Data Sets
Tran, Khoi-Nguyen
Jin, Huidong
[J]. THIRD INTERNATIONAL CONFERENCE ON KNOWLEDGE DISCOVERY AND DATA MINING: WKDD 2010, PROCEEDINGS, 2010, : 383 - 386
[5] Missing Value Estimation for Mixed-Attribute Data Sets
Zhu, Xiaofeng
Zhang, Shichao
Jin, Zhi
Zhang, Zili
Xu, Zhuoming
[J]. IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2011, 23 (01) : 110 - 121
[6] A fast outlier detection strategy for distributed high-dimensional data sets with mixed attributes
Koufakou, Anna
Georgiopoulos, Michael
[J]. DATA MINING AND KNOWLEDGE DISCOVERY, 2010, 20 (02) : 259 - 289
[7] A fast outlier detection strategy for distributed high-dimensional data sets with mixed attributes
Anna Koufakou
Michael Georgiopoulos
[J]. Data Mining and Knowledge Discovery, 2010, 20 : 259 - 289
[8] Random Mixed Field Model for Mixed-Attribute Data Restoration
Li, Qiang
Bian, Wei
Xu, Richard Yi Da
You, Jane
Tao, Dacheng
[J]. THIRTIETH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, 2016, : 1244 - 1250
[9] Clustering Mixed-Attribute Data using Random Walk
Skabar, Andrew
[J]. INTERNATIONAL CONFERENCE ON COMPUTATIONAL SCIENCE (ICCS 2017), 2017, 108 : 988 - 997
[10] An Effective Pattern Based Outlier Detection Approach for Mixed Attribute Data
Zhang, Ke
Jin, Huidong
[J]. AI 2010: ADVANCES IN ARTIFICIAL INTELLIGENCE, 2010, 6464 : 122 - 131

← 1 2 3 4 5 →