A fast outlier detection strategy for distributed high-dimensional data sets with mixed attributes

被引:79
|
作者
Koufakou, Anna [1 ,2 ]
Georgiopoulos, Michael [2 ]
机构
[1] Florida Gulf Coast Univ, UA Whitaker Sch Engn, Ft Myers, FL 33965 USA
[2] Univ Cent Florida, Sch EECS, Orlando, FL 32816 USA
基金
美国国家科学基金会;
关键词
Outlier detection; Anomaly detection; Data mining; Distributed data sets; Mixed attribute data sets; High-dimensional data sets;
D O I
10.1007/s10618-009-0148-z
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Outlier detection has attracted substantial attention in many applications and research areas; some of the most prominent applications are network intrusion detection or credit card fraud detection. Many of the existing approaches are based on calculating distances among the points in the dataset. These approaches cannot easily adapt to current datasets that usually contain a mix of categorical and continuous attributes, and may be distributed among different geographical locations. In addition, current datasets usually have a large number of dimensions. These datasets tend to be sparse, and traditional concepts such as Euclidean distance or nearest neighbor become unsuitable. We propose a fast distributed outlier detection strategy intended for datasets containing mixed attributes. The proposed method takes into consideration the sparseness of the dataset, and is experimentally shown to be highly scalable with the number of points and the number of attributes in the dataset. Experimental results show that the proposed outlier detection method compares very favorably with other state-of-the art outlier detection strategies proposed in the literature and that the speedup achieved by its distributed version is very close to linear.
引用
收藏
页码:259 / 289
页数:31
相关论文
共 50 条
  • [1] A fast outlier detection strategy for distributed high-dimensional data sets with mixed attributes
    Anna Koufakou
    Michael Georgiopoulos
    [J]. Data Mining and Knowledge Discovery, 2010, 20 : 259 - 289
  • [2] Projected outlier detection in high-dimensional mixed-attributes data set
    Ye, Mao
    Li, Xue
    Orlowska, Maria E.
    [J]. EXPERT SYSTEMS WITH APPLICATIONS, 2009, 36 (03) : 7104 - 7113
  • [3] Fast Distributed Outlier Detection in Mixed-Attribute Data Sets
    Matthew Eric Otey
    Amol Ghoting
    Srinivasan Parthasarathy
    [J]. Data Mining and Knowledge Discovery, 2006, 12 : 203 - 228
  • [4] Fast distributed outlier detection in mixed-attribute data sets
    Otey, ME
    Ghoting, A
    Parthasarathy, S
    [J]. DATA MINING AND KNOWLEDGE DISCOVERY, 2006, 12 (2-3) : 203 - 228
  • [5] Ordinal Outlier Algorithm for Anomaly Detection of High-Dimensional Data Sets
    Chen, Gang
    Du, Linlin
    An, Baoran
    [J]. PROCEEDINGS OF THE 32ND 2020 CHINESE CONTROL AND DECISION CONFERENCE (CCDC 2020), 2020, : 5356 - 5361
  • [6] On eigenfunction approach to data mining: outlier detection in high-dimensional data sets
    Nagar, AK
    Muyeba, MK
    [J]. 8TH WORLD MULTI-CONFERENCE ON SYSTEMICS, CYBERNETICS AND INFORMATICS, VOL II, PROCEEDINGS: COMPUTING TECHNIQUES, 2004, : 251 - 256
  • [7] Outlier detection for high-dimensional data
    Ro, Kwangil
    Zou, Changliang
    Wang, Zhaojun
    Yin, Guosheng
    [J]. BIOMETRIKA, 2015, 102 (03) : 589 - 599
  • [8] Fast outlier detection for high-dimensional data of wireless sensor networks
    Qiao, Yan
    Cui, Xinhong
    Jin, Peng
    Zhang, Wu
    [J]. INTERNATIONAL JOURNAL OF DISTRIBUTED SENSOR NETWORKS, 2020, 16 (10):
  • [9] Outlier mining in large high-dimensional data sets
    Angiulli, F
    Pizzuti, C
    [J]. IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2005, 17 (02) : 203 - 215
  • [10] Efficient Outlier Detection for High-Dimensional Data
    Liu, Huawen
    Li, Xuelong
    Li, Jiuyong
    Zhang, Shichao
    [J]. IEEE TRANSACTIONS ON SYSTEMS MAN CYBERNETICS-SYSTEMS, 2018, 48 (12): : 2451 - 2461