Testing of algorithms for anomaly detection in Big data using apache spark

被引:0
|
作者
Lighari, Sheeraz Niaz [1 ]
Hussain, Dil Muhammad Akbar [1 ]
机构
[1] Aalborg Univ, Dept Energy Technol, Esbjerg, Denmark
关键词
Big data; Security analytics; Machine learning;
D O I
10.1109/CICN.2017.23
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The constant upsurge in the size of networks and the data massively produced by them has made the data analysis very challenging principally the data attaining the boundaries of big data and it becomes even more difficult to detect intrusions in the case of big data. In this era, the experts find very limited tools and methods to analyze big data for security reasons. Either we need to device new tools or we can use existing tools in a novel manner to achieve the purpose of big data security analysis. In this paper, we are using apache spark a big data tool for analyzing the big dataset for anomaly detection. The anomaly detection is performed by using different machine learning algorithms like Logistic regression, Support vector machine, Naive bayes, Decision trees, Random forest, and Kmeans. More or less all the aforementioned algorithms are capable to detect anomalies in big data but we need to know how efficiently each performs. The main objective of this investigation is to find the most efficient algorithm in the context of anomaly detection. In this regard, we set to compare their training time, prediction time, and the rate of accuracy. The analysis was implemented on Kddcup99 dataset Although this dataset is of size in megabytes but it meets our purpose here for big data security analytics.
引用
收藏
页码:97 / 100
页数:4
相关论文
共 50 条
  • [1] Unsupervised Graph Anomaly Detection Algorithms Implemented in Apache Spark
    Semenov, A.
    Mazeev, A.
    Doropheev, D.
    Yusubaliev, T.
    LOBACHEVSKII JOURNAL OF MATHEMATICS, 2018, 39 (09) : 1262 - 1269
  • [2] Exhaustive search algorithms to mine subgroups on Big Data using Apache Spark
    Padillo F.
    Luna J.M.
    Ventura S.
    Progress in Artificial Intelligence, 2017, 6 (2) : 145 - 158
  • [3] Effective Selection of Machine Learning Algorithms for Big Data Analytics Using Apache Spark
    Hafez, Manar Mohamed
    Shehab, Mohamed Elemam
    El Fakharany, Essam
    Hegazy, Abd El Ftah Abdel Ghfar
    PROCEEDINGS OF THE INTERNATIONAL CONFERENCE ON ADVANCED INTELLIGENT SYSTEMS AND INFORMATICS 2016, 2017, 533 : 692 - 704
  • [4] A Big Data Framework for Intrusion Detection in Smart Grids Using Apache Spark
    Vimalkumar, K.
    Radhika, N.
    2017 INTERNATIONAL CONFERENCE ON ADVANCES IN COMPUTING, COMMUNICATIONS AND INFORMATICS (ICACCI), 2017, : 198 - 204
  • [5] An Investigative Testing of Structured and Unstructured Data Formats in Big Data Application Using Apache Spark
    Rajesh Kumar Pallamala
    Paul Rodrigues
    Wireless Personal Communications, 2022, 122 : 603 - 620
  • [6] An Investigative Testing of Structured and Unstructured Data Formats in Big Data Application Using Apache Spark
    Pallamala, Rajesh Kumar
    Rodrigues, Paul
    WIRELESS PERSONAL COMMUNICATIONS, 2022, 122 (01) : 603 - 620
  • [7] Developing Big Data anomaly dynamic and static detection algorithms: AnomalyDSD spark package
    Garcia-Gil, Diego
    Lopez, David
    Arguelles-Martino, Daniel
    Carrasco, Jacinto
    Aguilera-Martos, Ignacio
    Luengo, Julian
    Herrera, Francisco
    INFORMATION SCIENCES, 2025, 690
  • [8] Big data analytics on Apache Spark
    Salloum S.
    Dautov R.
    Chen X.
    Peng P.X.
    Huang J.Z.
    International Journal of Data Science and Analytics, 2016, 1 (3-4) : 145 - 164
  • [9] Big Data Approach For IoT Botnet Traffic Detection Using Apache Spark Technology
    Arokodare, Oluwatomisin
    Wimmer, Hayden
    Du, Jie
    2023 IEEE 13TH ANNUAL COMPUTING AND COMMUNICATION WORKSHOP AND CONFERENCE, CCWC, 2023, : 1260 - 1266
  • [10] Fuzzy Based Clustering Algorithms to Handle Big Data with Implementation on Apache Spark
    Bharill, Neha
    Tiwari, Aruna
    Malviya, Aayushi
    PROCEEDINGS 2016 IEEE SECOND INTERNATIONAL CONFERENCE ON BIG DATA COMPUTING SERVICE AND APPLICATIONS (BIGDATASERVICE 2016), 2016, : 95 - 104