Clustering-based real-time anomaly detection-A breakthrough in big data technologies

被引:46
|
作者
Habeeb, Riyaz Ahamed Ariyaluran [1 ]
Nasaruddin, Fariza [1 ]
Gani, Abdullah [6 ]
Amanullah, Mohamed Ahzam [3 ]
Hashem, Ibrahim Abaker Targio [2 ]
Ahmed, Ejaz [4 ]
Imran, Muhammad [5 ]
机构
[1] Univ Malaya, Fac Comp Sci & Informat Technol, Dept Informat Syst, Kuala Lumpur 50603, Malaysia
[2] Taylors Univ, Sch Comp & Informat Technol, Subang Jaya, Malaysia
[3] Telekom Res & Dev Sdn Bhd, Res & Innovat Dev, Cyberjaya, Malaysia
[4] Univ Malaya, Ctr Mobile Cloud Comp Res C4MCCR, Kuala Lumpur, Malaysia
[5] King Saud Univ, Coll Appl Comp Sci, Riyadh, Saudi Arabia
[6] Univ Malaya, Dept Comp Syst & Technol, Fac Comp Sci & Informat Technol, Kuala Lumpur, Malaysia
关键词
DETECTION SYSTEM; FRAMEWORK; INTERNET; MACHINE;
D O I
10.1002/ett.3647
中图分类号
TN [电子技术、通信技术];
学科分类号
0809 ;
摘要
Off late, the ever increasing usage of a connected Internet-of-Things devices has consequently augmented the volume of real-time network data with high velocity. At the same time, threats on networks become inevitable; hence, identifying anomalies in real time network data has become crucial. To date, most of the existing anomaly detection approaches focus mainly on machine learning techniques for batch processing. Meanwhile, detection approaches which focus on the real-time analytics somehow deficient in its detection accuracy while consuming higher memory and longer execution time. As such, this paper proposes a novel framework which focuses on real-time anomaly detection based on big data technologies. In addition, this paper has also developed streaming sliding window local outlier factor coreset clustering algorithms (SSWLOFCC), which was then implemented into the framework. The proposed framework that comprises BroIDS, Flume, Kafka, Spark streaming, SparkMLlib, Matplot and HBase was evaluated to substantiate its efficacy, particularly in terms of accuracy, memory consumption, and execution time. The evaluation is done by performing critical comparative analysis using existing approaches, such as K-means, hierarchical density-based spatial clustering of applications with noise (HDBSCAN), isolation forest, spectral clustering and agglomerative clustering. Moreover, Adjusted Rand Index and memory profiler package were used for the evaluation of the proposed framework against the existing approaches. The outcome of the evaluation has substantially proven the efficacy of the proposed framework with a much higher accuracy rate of 96.51% when compared to other algorithms. Besides, the proposed framework also outperformed the existing algorithms in terms of lesser memory consumption and execution time. Ultimately the proposed solution enable analysts to precisely track and detect anomalies in real time.
引用
收藏
页数:27
相关论文
共 50 条
  • [31] A Hybrid Unsupervised Clustering-Based Anomaly Detection Method
    Pu, Guo
    Wang, Lijuan
    Shen, Jun
    Dong, Fang
    TSINGHUA SCIENCE AND TECHNOLOGY, 2021, 26 (02) : 146 - 153
  • [32] Clustering-based label estimation for network anomaly detection
    Baek, Sunhee
    Kwon, Donghwoon
    Suh, Sang C.
    Kim, Hyunjoo
    Kim, Ikkyun
    Kim, Jinoh
    DIGITAL COMMUNICATIONS AND NETWORKS, 2021, 7 (01) : 37 - 44
  • [33] Spammer Detection for Real-Time Big Data Graphs
    Eom, Chris Soo-Hyun
    Lee, James Jung-hun
    Lee, Wookey
    Kim, Jinho
    2016 INT IEEE CONFERENCES ON UBIQUITOUS INTELLIGENCE & COMPUTING, ADVANCED & TRUSTED COMPUTING, SCALABLE COMPUTING AND COMMUNICATIONS, CLOUD AND BIG DATA COMPUTING, INTERNET OF PEOPLE, AND SMART WORLD CONGRESS (UIC/ATC/SCALCOM/CBDCOM/IOP/SMARTWORLD), 2016, : 1227 - 1227
  • [34] Real-Time Sentiment-Based Anomaly Detection in Twitter Data Streams
    Patel, Khantil
    Hoeber, Orland
    Hamilton, Howard J.
    ADVANCES IN ARTIFICIAL INTELLIGENCE (AI 2015), 2015, 9091 : 196 - 203
  • [35] TrueDetective 4.0: A Big Data Architecture for Real Time Anomaly Detection
    Argento, Luciano
    De Francesco, Erika
    Lambardi, Pasquale
    Piantedosi, Paolo
    Romeo, Carlo
    FOUNDATIONS OF INTELLIGENT SYSTEMS (ISMIS 2022), 2022, 13515 : 449 - 458
  • [36] On the use of IoT and Big Data Technologies for Real-time Monitoring and Data Processing
    Nait Maleka, Y.
    Kharbouch, A.
    El Khoukhi, H.
    Bakhouya, M.
    De Florio, V.
    El Ouadghiri, D.
    Latre, S.
    Blondia, C.
    8TH INTERNATIONAL CONFERENCE ON EMERGING UBIQUITOUS SYSTEMS AND PERVASIVE NETWORKS (EUSPN 2017) / 7TH INTERNATIONAL CONFERENCE ON CURRENT AND FUTURE TRENDS OF INFORMATION AND COMMUNICATION TECHNOLOGIES IN HEALTHCARE (ICTH-2017) / AFFILIATED WORKSHOPS, 2017, 113 : 429 - 434
  • [37] IoT and Big Data Technologies for Monitoring and Processing Real-Time Healthcare Data
    Kharbouch, Abdelhak
    Naitmalek, Youssef
    Elkhoukhi, Hamza
    Bakhouya, Mohamed
    De Florio, Vincenzo
    Driss El Ouadghiri, Moulay
    Latre, Steven
    Blondia, Chris
    INTERNATIONAL JOURNAL OF DISTRIBUTED SYSTEMS AND TECHNOLOGIES, 2019, 10 (04) : 17 - 30
  • [38] Clustering-based method for big spatial data partitioning
    Zein A.A.
    Dowaji S.
    Al-Khayatt M.I.
    Measurement: Sensors, 2023, 27
  • [39] Real-time anomaly detection in gas sensor streaming data
    Wu, Haibo
    Shi, Shiliang
    INTERNATIONAL JOURNAL OF EMBEDDED SYSTEMS, 2021, 14 (01) : 81 - 88
  • [40] Centrality Clustering-Based Sampling for Big Data Visualization
    Tam Thanh Nguyen
    Song, Insu
    2016 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2016, : 1911 - 1917