Testing of algorithms for anomaly detection in Big data using apache spark

被引:0
|
作者
Lighari, Sheeraz Niaz [1 ]
Hussain, Dil Muhammad Akbar [1 ]
机构
[1] Aalborg Univ, Dept Energy Technol, Esbjerg, Denmark
关键词
Big data; Security analytics; Machine learning;
D O I
10.1109/CICN.2017.23
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The constant upsurge in the size of networks and the data massively produced by them has made the data analysis very challenging principally the data attaining the boundaries of big data and it becomes even more difficult to detect intrusions in the case of big data. In this era, the experts find very limited tools and methods to analyze big data for security reasons. Either we need to device new tools or we can use existing tools in a novel manner to achieve the purpose of big data security analysis. In this paper, we are using apache spark a big data tool for analyzing the big dataset for anomaly detection. The anomaly detection is performed by using different machine learning algorithms like Logistic regression, Support vector machine, Naive bayes, Decision trees, Random forest, and Kmeans. More or less all the aforementioned algorithms are capable to detect anomalies in big data but we need to know how efficiently each performs. The main objective of this investigation is to find the most efficient algorithm in the context of anomaly detection. In this regard, we set to compare their training time, prediction time, and the rate of accuracy. The analysis was implemented on Kddcup99 dataset Although this dataset is of size in megabytes but it meets our purpose here for big data security analytics.
引用
收藏
页码:97 / 100
页数:4
相关论文
共 50 条
  • [41] MaRe: Processing Big Data with application containers on Apache Spark
    Capuccini, Marco
    Dahlo, Martin
    Toor, Salman
    Spjuth, Ola
    GIGASCIENCE, 2020, 9 (05):
  • [42] Apache Spark Methods and Techniques in Big Data-A Review
    Sahana, H. P.
    Sanjana, M. S.
    Muddasir, N. Mohammed
    Vidyashree, K. P.
    INVENTIVE COMMUNICATION AND COMPUTATIONAL TECHNOLOGIES, ICICCT 2019, 2020, 89 : 721 - 726
  • [43] SparkJNI: A Toolchain for Hardware Accelerated Big Data Apache Spark
    Voicu, Tudor Alexandru
    Al-Ars, Zaid
    2019 4TH IEEE INTERNATIONAL CONFERENCE ON BIG DATA ANALYTICS (ICBDA 2019), 2019, : 152 - 157
  • [44] BigDebug: Interactive Debugger for Big Data Analytics in Apache Spark
    Gulzar, Muhammad Ali
    Interlandi, Matteo
    Condie, Tyson
    Kim, Miryung
    FSE'16: PROCEEDINGS OF THE 2016 24TH ACM SIGSOFT INTERNATIONAL SYMPOSIUM ON FOUNDATIONS OF SOFTWARE ENGINEERING, 2016, : 1033 - 1037
  • [45] Sentiment classification using paragraph vector and cognitive big data semantics on Apache Spark
    Ravi, Kumar
    Ravi, Vadlamani
    Shivakrishna, B.
    PROCEEDINGS OF 2018 IEEE 17TH INTERNATIONAL CONFERENCE ON COGNITIVE INFORMATICS & COGNITIVE COMPUTING (ICCI*CC 2018), 2018, : 187 - 194
  • [46] A distributed evolutionary based instance selection algorithm for big data using Apache Spark
    Qin, Liyang
    Wang, Xiaoli
    Yin, Linzi
    Jiang, Zhaohui
    APPLIED SOFT COMPUTING, 2024, 159
  • [47] Anomaly Detection with Machine Learning Algorithms and Big Data in Electricity Consumption
    Oprea, Simona-Vasilica
    Bara, Adela
    Puican, Florina Camelia
    Radu, Ioan Cosmin
    SUSTAINABILITY, 2021, 13 (19)
  • [48] Robust Anomaly Detection Algorithms for Real-time Big Data Comparison of algorithms
    Hasani, Zirije
    2017 6TH MEDITERRANEAN CONFERENCE ON EMBEDDED COMPUTING (MECO), 2017, : 449 - 454
  • [49] Sehaa: A Big Data Analytics Tool for Healthcare Symptoms and Diseases Detection Using Twitter, Apache Spark, and Machine Learning
    Alotaibi, Shoayee
    Mehmood, Rashid
    Katib, Iyad
    Rana, Omer
    Albeshri, Aiiad
    APPLIED SCIENCES-BASEL, 2020, 10 (04):
  • [50] Network Intrusion Detection on Apache Spark with Machine Learning Algorithms
    Kurt, Elif Merve
    Becerikli, Yasar
    ENGINEERING APPLICATIONS OF NEURAL NETWORKS, EANN 2018, 2018, 893 : 130 - 141