Testing of algorithms for anomaly detection in Big data using apache spark

被引:0
|
作者
Lighari, Sheeraz Niaz [1 ]
Hussain, Dil Muhammad Akbar [1 ]
机构
[1] Aalborg Univ, Dept Energy Technol, Esbjerg, Denmark
关键词
Big data; Security analytics; Machine learning;
D O I
10.1109/CICN.2017.23
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The constant upsurge in the size of networks and the data massively produced by them has made the data analysis very challenging principally the data attaining the boundaries of big data and it becomes even more difficult to detect intrusions in the case of big data. In this era, the experts find very limited tools and methods to analyze big data for security reasons. Either we need to device new tools or we can use existing tools in a novel manner to achieve the purpose of big data security analysis. In this paper, we are using apache spark a big data tool for analyzing the big dataset for anomaly detection. The anomaly detection is performed by using different machine learning algorithms like Logistic regression, Support vector machine, Naive bayes, Decision trees, Random forest, and Kmeans. More or less all the aforementioned algorithms are capable to detect anomalies in big data but we need to know how efficiently each performs. The main objective of this investigation is to find the most efficient algorithm in the context of anomaly detection. In this regard, we set to compare their training time, prediction time, and the rate of accuracy. The analysis was implemented on Kddcup99 dataset Although this dataset is of size in megabytes but it meets our purpose here for big data security analytics.
引用
收藏
页码:97 / 100
页数:4
相关论文
共 50 条
  • [21] Big data classification using deep learning and apache spark architecture
    Brahmane, Anilkumar, V
    Krishna, B. Chaitanya
    NEURAL COMPUTING & APPLICATIONS, 2021, 33 (22): : 15253 - 15266
  • [22] Network Traffic Anomaly Detection based on Apache Spark
    Pwint, Phyo Htet
    Shwe, Thanda
    2019 INTERNATIONAL CONFERENCE ON ADVANCED INFORMATION TECHNOLOGIES (ICAIT), 2019, : 222 - 226
  • [23] Efficient Feature Extraction Using Apache Spark for Network Behavior Anomaly Detection
    Xiaoming Ye
    Xingshu Chen
    Dunhu Liu
    Wenxian Wang
    Li Yang
    Gang Liang
    Guolin Shao
    TsinghuaScienceandTechnology, 2018, 23 (05) : 561 - 573
  • [24] Big Data in metagenomics: Apache Spark vs MPI
    Abuin, Jose M.
    Lopes, Nuno
    Ferreira, Luis
    Pena, Tomas F.
    Schmidt, Bertil
    PLOS ONE, 2020, 15 (10):
  • [25] Scalable Manifold Learning for Big Data with Apache Spark
    Schoeneman, Frank
    Zola, Jaroslaw
    2018 IEEE INTERNATIONAL CONFERENCE ON BIG DATA (BIG DATA), 2018, : 272 - 281
  • [26] Static and Dynamic Big Data Partitioning on Apache Spark
    Bertolucci, Massimiliano
    Carlini, Emanuele
    Dazzi, Patrizio
    Lulli, Alessandro
    Ricci, Laura
    PARALLEL COMPUTING: ON THE ROAD TO EXASCALE, 2016, 27 : 489 - 498
  • [27] Accelerating Apache Spark Big Data Analysis with FPGAs
    Ghasemi, Ehsan
    Chow, Paul
    2016 IEEE 24TH ANNUAL INTERNATIONAL SYMPOSIUM ON FIELD-PROGRAMMABLE CUSTOM COMPUTING MACHINES (FCCM), 2016, : 94 - 94
  • [28] Accelerating Apache Spark Big Data Analysis with FPGAs
    Ghasemi, Ehsan
    Chow, Paul
    2016 INT IEEE CONFERENCES ON UBIQUITOUS INTELLIGENCE & COMPUTING, ADVANCED & TRUSTED COMPUTING, SCALABLE COMPUTING AND COMMUNICATIONS, CLOUD AND BIG DATA COMPUTING, INTERNET OF PEOPLE, AND SMART WORLD CONGRESS (UIC/ATC/SCALCOM/CBDCOM/IOP/SMARTWORLD), 2016, : 737 - 744
  • [29] Apache Spark: A Unified Engine for Big Data Processing
    Zaharia, Matei
    Xin, Reynold S.
    Wendell, Patrick
    Das, Tathagata
    Armbrust, Michael
    Dave, Ankur
    Meng, Xiangrui
    Rosen, Josh
    Venkataraman, Shivaram
    Franklin, Michael J.
    Ghodsi, Ali
    Gonzalez, Joseph
    Shenker, Scott
    Stoica, Ion
    COMMUNICATIONS OF THE ACM, 2016, 59 (11) : 56 - 65
  • [30] A Big Data Analysis Platform for Healthcare on Apache Spark
    Zhang, Jinwei
    Zhang, Yong
    Hu, Qingcheng
    Tian, Hongliang
    Xing, Chunxiao
    SMART HEALTH, ICSH 2016, 2017, 10219 : 32 - 43