Sub-Grid Partitioning Algorithm for Distributed Outlier Detection on Big Data

被引:0
|
作者
Sakr, Mohamed [1 ]
Atwa, Walid [1 ]
Keshk, Arabi [1 ]
机构
[1] Menoufia Univ, Fac Comp & Informat, Dept Comp Sci, Menoufia, Egypt
关键词
Distributed Processing; Local outlier factor; Outlier detection; Big data; anomaly detection;
D O I
暂无
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Anomaly detection or outlier detection has become a major research problem in the era of big data. It is used in many applications, remove noise from signals and in credit card fraud detection. One type of outlier detection is Density-based outlier detection. Its major uniqueness is in detecting outlier points in different densities. One of the algorithms that are based on density based outlier detection is Local Outlier Factor (LOF). LOF gives every point a score that identifies its outlierness compared to other points. In this paper, we propose a new algorithm called sub-Grid partition (SGP) algorithm. SGP algorithm helps in calculating the LOF for Big Data in a distributed environment. SGP algorithm splits the tuples into small grids each grid is splitted into sub-grids. Sub-grids in the border are duplicated in every processing node for calculating the LOF for every tuple in these grids. Duplication of sub-grids lead to increase in the number of tuples that will be processed but in the other hand reduces the network overhead required for communication between processing nodes and reducing processing node idle time waiting for the requested tuple. In the end, we evaluate the performance of the SGP algorithm through a series of simulation experiments over real data sets.
引用
收藏
页码:252 / 257
页数:6
相关论文
共 50 条
  • [31] A constrained sequential-lamination algorithm for the simulation of sub-grid microstructure in martensitic materials
    Aubry, S
    Fago, M
    Ortiz, M
    [J]. COMPUTER METHODS IN APPLIED MECHANICS AND ENGINEERING, 2003, 192 (26-27) : 2823 - 2843
  • [32] A Hybrid Outlier Detection Algorithm Based On Partitioning Clustering And Density Measures
    Rizk, Hamada
    Elgokhy, Sherin
    Sarhan, Amany
    [J]. 2015 TENTH INTERNATIONAL CONFERENCE ON COMPUTER ENGINEERING & SYSTEMS (ICCES), 2015, : 175 - 181
  • [33] Data Grid tools: enabling science on big distributed data
    Allcock, B
    Chervenak, A
    Foster, I
    Kesselman, C
    Livny, M
    [J]. SCIDAC 2005: SCIENTIFIC DISCOVERY THROUGH ADVANCED COMPUTING, 2005, 16 : 571 - 575
  • [34] Uniform Partitioning of Data Grid for Association Detection
    Mousavi, Ali
    Baraniuk, Richard G.
    [J]. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2022, 44 (02) : 1098 - 1107
  • [35] Continuous adaptive outlier detection on distributed data streams
    Su, Liang
    Han, Weihong
    Yang, Shuqiang
    Zou, Peng
    Jia, Yan
    [J]. HIGH PERFORMANCE COMPUTING AND COMMUNICATIONS, PROCEEDINGS, 2007, 4782 : 74 - 85
  • [36] Distributed Big Data Mining Platform for Smart Grid
    Wang, Zhixiang
    Wu, Bin
    Bai, Demeng
    Qin, Jiafeng
    [J]. 2018 IEEE INTERNATIONAL CONFERENCE ON BIG DATA (BIG DATA), 2018, : 2345 - 2354
  • [37] A Hybrid Outlier Detection Method for Health Care Big Data
    Yan, Ke
    You, Xiaoming
    Ji, Xiaobo
    Yin, Guangqiang
    Yang, Fan
    [J]. PROCEEDINGS OF 2016 IEEE INTERNATIONAL CONFERENCES ON BIG DATA AND CLOUD COMPUTING (BDCLOUD 2016) SOCIAL COMPUTING AND NETWORKING (SOCIALCOM 2016) SUSTAINABLE COMPUTING AND COMMUNICATIONS (SUSTAINCOM 2016) (BDCLOUD-SOCIALCOM-SUSTAINCOM 2016), 2016, : 157 - 162
  • [38] Robust local outlier detection with statistical parameter for big data
    Lei, Jingsheng
    Jiang, Teng
    Wu, Kui
    Du, Haizhou
    Zhu, Lin
    [J]. COMPUTER SYSTEMS SCIENCE AND ENGINEERING, 2015, 30 (05): : 411 - 419
  • [39] Spatiotemporal data partitioning for distributed random forest algorithm: Air quality prediction using imbalanced big spatiotemporal data on spark distributed framework
    Asgari, Marjan
    Yang, Wanhong
    Farnaghi, Mahdi
    [J]. ENVIRONMENTAL TECHNOLOGY & INNOVATION, 2022, 27
  • [40] An Efficient Distributed Algorithm for Big Data Processing
    Mohammed S. Al-kahtani
    Lutful Karim
    [J]. Arabian Journal for Science and Engineering, 2017, 42 : 3149 - 3157