Hephaistos: A fast and distributed outlier detection approach for big mixed attribute data

被引:4
|
作者
Du, Haizhou [1 ]
Fang, Wei [1 ]
Wang, Yi [2 ]
机构
[1] Shanghai Univ Elect Power, Sch Comp Sci, Shanghai, Peoples R China
[2] State Grid Zhejiang Hangzhou Xiaoshan Power Suppl, Hangzhou, Zhejiang, Peoples R China
关键词
Mixed attribute data; clustering algorithm; local outlier detection; distributed framework; Spark platform;
D O I
10.3233/IDA-184176
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
This paper tackles a new problem in outlier detection: how to promptly detect the local outlier of a large-scale mixed attribute data in the big data era. This poses significant challenges due to a lack of access to the entire mixed attribute dataset at any individual compute machine. Proposed approaches firstly form a mechanism that deletes the massive clear non-noise and extracts cluster-based pre-noise set. Furthermore, we analyze pre-noise set using multi-step distributed LOF computing method on the Spark platform. Finally, the ordered LOF list is the output result. Comprehensive experiments are implemented by large-scale Benchmark datasets and the Spark platform. Extensive results show that the performance of our approaches are superior to the previous ones (4X faster than baseline LOF/2X faster than DLOF) when compared to state-of-the-art techniques, and therefore is believed to be able to give better guidance to local outlier detection of mixed attribute data.
引用
收藏
页码:759 / 778
页数:20
相关论文
共 50 条
  • [21] Big Data Outlier Detection Algorithm Based on Grid
    Guo Wei-Wei
    Liu Feng
    [J]. 2018 11TH INTERNATIONAL CONFERENCE ON INTELLIGENT COMPUTATION TECHNOLOGY AND AUTOMATION (ICICTA 2018), 2018, : 274 - 277
  • [22] Implementation of Infrastructure for Streaming Outlier Detection in Big Data
    Hasani, Zirije
    [J]. RECENT ADVANCES IN INFORMATION SYSTEMS AND TECHNOLOGIES, VOL 2, 2017, 570 : 503 - 511
  • [23] Distributed outlier detection in hierarchically structured datasets with mixed attributes
    Liang, Qiao
    Wang, Kaibo
    [J]. QUALITY TECHNOLOGY AND QUANTITATIVE MANAGEMENT, 2020, 17 (03): : 337 - 353
  • [24] Continuous adaptive outlier detection on distributed data streams
    Su, Liang
    Han, Weihong
    Yang, Shuqiang
    Zou, Peng
    Jia, Yan
    [J]. HIGH PERFORMANCE COMPUTING AND COMMUNICATIONS, PROCEEDINGS, 2007, 4782 : 74 - 85
  • [25] Fast outlier detection for very large log data
    Kim, Seung
    Cho, Nam Wook
    Kang, Bokyoung
    Kang, Suk-Ho
    [J]. EXPERT SYSTEMS WITH APPLICATIONS, 2011, 38 (08) : 9587 - 9596
  • [26] A Fast and Efficient Local Outlier Detection in Data Streams
    Yang, Xing
    Zhou, Wenli
    Shu, Nanfei
    Zhang, Hao
    [J]. PROCEEDINGS OF 2019 INTERNATIONAL CONFERENCE ON IMAGE, VIDEO AND SIGNAL PROCESSING (IVSP 2019), 2019, : 111 - 116
  • [27] Outlier detection approach based on local outlier factor for datasets with mixed attributes
    [J]. Cho, Nam-Wook (nwcho@seoultech.ac.kr), 2016, ICIC Express Letters Office (07):
  • [28] A Hybrid Outlier Detection Method for Health Care Big Data
    Yan, Ke
    You, Xiaoming
    Ji, Xiaobo
    Yin, Guangqiang
    Yang, Fan
    [J]. PROCEEDINGS OF 2016 IEEE INTERNATIONAL CONFERENCES ON BIG DATA AND CLOUD COMPUTING (BDCLOUD 2016) SOCIAL COMPUTING AND NETWORKING (SOCIALCOM 2016) SUSTAINABLE COMPUTING AND COMMUNICATIONS (SUSTAINCOM 2016) (BDCLOUD-SOCIALCOM-SUSTAINCOM 2016), 2016, : 157 - 162
  • [29] Robust local outlier detection with statistical parameter for big data
    Lei, Jingsheng
    Jiang, Teng
    Wu, Kui
    Du, Haizhou
    Zhu, Lin
    [J]. COMPUTER SYSTEMS SCIENCE AND ENGINEERING, 2015, 30 (05): : 411 - 419
  • [30] A Novel Approach for Outlier Detection in Multivariate Data
    Afzal, Saima
    Afzal, Ayesha
    Amin, Muhammad
    Saleem, Sehar
    Ali, Nouman
    Sajid, Muhammad
    [J]. MATHEMATICAL PROBLEMS IN ENGINEERING, 2021, 2021