Efficient k-means Using Triangle Inequality on Spark for Cyber Security Analytics

被引:6
|
作者
Chitrakar, Ambika Shrestha [1 ]
Petrovic, Slobodan [1 ]
机构
[1] Norwegian Univ Sci & Technol NTNU, Gjovik, Norway
关键词
k-means Clustering; Triangle Inequality; Security Analytics; Apache Spark; Web Attacks;
D O I
10.1145/3309182.3309187
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
With the advancement in technology and the increase in the number of digital sources, data quantity increases every day and, consequently, the cyber security related data quantity. Traditional security systems such as Intrusion Detection Systems (IDS) are not capable of handling such a growing amount of data set in real time. Cyber security analytics is an alternative solution to such traditional security systems, which can use big data analytics techniques to provide a faster and scalable framework to handle a large amount of cyber security related data in real time. k-means clustering is one of the commonly used clustering algorithms in cyber security analytics aimed at dividing security related data into groups of similar entities, which in turn can help in gaining important insights about the known and unknown attack patterns. This technique helps a security analyst to focus on the data specific to some clusters only for the analysis. To improve performance, k-means can exploit the triangle inequality to skip many point-center distance computations, without affecting the clustering results. In this paper, we re-formulate the parallel version of Elkan's k-means with triangle inequality (k-meansTl) algorithm, implement this algorithm on Apache Spark, and use it to classify Web attacks in different clusters. The paper also provides the speed comparison of our parallel k-meansTI on Spark with the Spark ML k-means clustering algorithm.
引用
收藏
页码:37 / 45
页数:9
相关论文
共 50 条
  • [1] Efficient Parallel K-Means on MapReduce Using Triangle Inequality
    Al Ghamdi, Sami
    Di Fatta, Giuseppe
    [J]. 2017 IEEE 15TH INTL CONF ON DEPENDABLE, AUTONOMIC AND SECURE COMPUTING, 15TH INTL CONF ON PERVASIVE INTELLIGENCE AND COMPUTING, 3RD INTL CONF ON BIG DATA INTELLIGENCE AND COMPUTING AND CYBER SCIENCE AND TECHNOLOGY CONGRESS(DASC/PICOM/DATACOM/CYBERSCI, 2017, : 985 - 992
  • [2] Analyzing Digital Evidence Using Parallel k-means with Triangle Inequality on Spark
    Chitrakar, Ambika Shrestha
    Petrovic, Slobodan
    [J]. 2018 IEEE INTERNATIONAL CONFERENCE ON BIG DATA (BIG DATA), 2018, : 3049 - 3058
  • [3] KPynq: A Work-Efficient Triangle-Inequality based K-means on FPGA
    Wang, Yuke
    Zeng, Zhaorui
    Feng, Boyuan
    Deng, Lei
    Ding, Yufei
    [J]. 2019 27TH IEEE ANNUAL INTERNATIONAL SYMPOSIUM ON FIELD-PROGRAMMABLE CUSTOM COMPUTING MACHINES (FCCM), 2019, : 320 - 320
  • [4] A Hybrid MPI/OpenMP Parallelization of K-Means Algorithms Accelerated Using the Triangle Inequality
    Kwedlo, Wojciech
    Czochanski, Pawel J.
    [J]. IEEE ACCESS, 2019, 7 : 42280 - 42297
  • [5] An empirical evaluation of strategies based on the triangle inequality for accelerating the k-means algorithm
    Matte, Marcelo Kuchar
    do Carmo Nicoletti, Maria
    [J]. International Journal of Innovative Computing and Applications, 2022, 13 (04) : 198 - 209
  • [6] TiAcc: Triangle-inequality based Hardware Accelerator for K-means on FPGAs
    Wang, Yuke
    Feng, Boyuan
    Li, Gushu
    Tzimpragos, Georgios
    Deng, Lei
    Xie, Yuan
    Ding, Yufei
    [J]. 21ST IEEE/ACM INTERNATIONAL SYMPOSIUM ON CLUSTER, CLOUD AND INTERNET COMPUTING (CCGRID 2021), 2021, : 133 - 142
  • [8] Using a Set of Triangle Inequalities to Accelerate K-means Clustering
    Yu, Qiao
    Chen, Kuan-Hsun
    Chen, Jian-Jia
    [J]. SIMILARITY SEARCH AND APPLICATIONS, SISAP 2020, 2020, 12440 : 297 - 311
  • [9] A Fast Exact k-Nearest Neighbors Algorithm for High Dimensional Search Using k-Means Clustering and Triangle Inequality
    Wang, Xueyi
    [J]. 2011 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2011, : 1293 - 1299
  • [10] Users Segmentation Based on Google Analytics Income Using K-Means
    La Cruz, Alexandra
    Severeyn, Erika
    Matute, Roberto
    Estrada, Juan
    [J]. INFORMATION AND COMMUNICATION TECHNOLOGIES (TICEC 2021), 2021, 1456 : 225 - 235