Hephaistos: A fast and distributed outlier detection approach for big mixed attribute data

被引:4
|
作者
Du, Haizhou [1 ]
Fang, Wei [1 ]
Wang, Yi [2 ]
机构
[1] Shanghai Univ Elect Power, Sch Comp Sci, Shanghai, Peoples R China
[2] State Grid Zhejiang Hangzhou Xiaoshan Power Suppl, Hangzhou, Zhejiang, Peoples R China
关键词
Mixed attribute data; clustering algorithm; local outlier detection; distributed framework; Spark platform;
D O I
10.3233/IDA-184176
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
This paper tackles a new problem in outlier detection: how to promptly detect the local outlier of a large-scale mixed attribute data in the big data era. This poses significant challenges due to a lack of access to the entire mixed attribute dataset at any individual compute machine. Proposed approaches firstly form a mechanism that deletes the massive clear non-noise and extracts cluster-based pre-noise set. Furthermore, we analyze pre-noise set using multi-step distributed LOF computing method on the Spark platform. Finally, the ordered LOF list is the output result. Comprehensive experiments are implemented by large-scale Benchmark datasets and the Spark platform. Extensive results show that the performance of our approaches are superior to the previous ones (4X faster than baseline LOF/2X faster than DLOF) when compared to state-of-the-art techniques, and therefore is believed to be able to give better guidance to local outlier detection of mixed attribute data.
引用
收藏
页码:759 / 778
页数:20
相关论文
共 50 条
  • [1] Fast Distributed Outlier Detection in Mixed-Attribute Data Sets
    Matthew Eric Otey
    Amol Ghoting
    Srinivasan Parthasarathy
    [J]. Data Mining and Knowledge Discovery, 2006, 12 : 203 - 228
  • [2] Fast distributed outlier detection in mixed-attribute data sets
    Otey, ME
    Ghoting, A
    Parthasarathy, S
    [J]. DATA MINING AND KNOWLEDGE DISCOVERY, 2006, 12 (2-3) : 203 - 228
  • [3] A practical outlier detection approach for mixed-attribute data
    Bouguessa, Mohamed
    [J]. EXPERT SYSTEMS WITH APPLICATIONS, 2015, 42 (22) : 8637 - 8649
  • [4] An Effective Pattern Based Outlier Detection Approach for Mixed Attribute Data
    Zhang, Ke
    Jin, Huidong
    [J]. AI 2010: ADVANCES IN ARTIFICIAL INTELLIGENCE, 2010, 6464 : 122 - 131
  • [5] Distributed Local Outlier Detection in Big Data
    Yan, Yizhou
    Cao, Lei
    Kuhlman, Caitlin
    Rundensteiner, Elke
    [J]. KDD'17: PROCEEDINGS OF THE 23RD ACM SIGKDD INTERNATIONAL CONFERENCE ON KNOWLEDGE DISCOVERY AND DATA MINING, 2017, : 1225 - 1234
  • [6] A fast outlier detection strategy for distributed high-dimensional data sets with mixed attributes
    Koufakou, Anna
    Georgiopoulos, Michael
    [J]. DATA MINING AND KNOWLEDGE DISCOVERY, 2010, 20 (02) : 259 - 289
  • [7] A fast outlier detection strategy for distributed high-dimensional data sets with mixed attributes
    Anna Koufakou
    Michael Georgiopoulos
    [J]. Data Mining and Knowledge Discovery, 2010, 20 : 259 - 289
  • [8] Outlier Detection Based on Fuzzy Rough Granules in Mixed Attribute Data
    Yuan, Zhong
    Chen, Hongmei
    Li, Tianrui
    Sang, Binbin
    Wang, Shu
    [J]. IEEE TRANSACTIONS ON CYBERNETICS, 2022, 52 (08) : 8399 - 8412
  • [9] Distributed Top-N Local Outlier Detection in Big Data
    Yan, Yizhou
    Cao, Lei
    Rundensteiner, Elke A.
    [J]. 2017 IEEE INTERNATIONAL CONFERENCE ON BIG DATA (BIG DATA), 2017, : 827 - 836
  • [10] A distributed density-based outlier detection algorithm on big data
    Mei, Lin
    Zhang, Fengli
    [J]. International Journal of Network Security, 2020, 22 (05): : 775 - 781