Unstructured Big Data Threat Intelligence Parallel Mining Algorithm

被引:0
|
作者
Li, Zhihua [1 ]
Yu, Xinye [1 ]
Wei, Tao [1 ]
Qian, Junhao [2 ]
机构
[1] Jiangnan Univ, Sch Artificial Intelligence & Comp Sci, Wuxi 214122, Peoples R China
[2] Jiangnan Univ, Sch IoT Engn, Wuxi 214122, Peoples R China
来源
BIG DATA MINING AND ANALYTICS | 2024年 / 7卷 / 02期
关键词
unstructured big data mining; parallel deep forest; multi-label classification algorithm; threat intelligence;
D O I
10.26599/BDMA.2023.9020032
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
To efficiently mine threat intelligence from the vast array of open-source cybersecurity analysis reports on the web, we have developed the Parallel Deep Forest-based Multi-Label Classification (PDFMLC) algorithm. Initially, open-source cybersecurity analysis reports are collected and converted into a standardized text format. Subsequently, five tactics category labels are annotated, creating a multi-label dataset for tactics classification. Addressing the limitations of low execution efficiency and scalability in the sequential deep forest algorithm, our PDFMLC algorithm employs broadcast variables and the Lempel-Ziv-Welch (LZW) algorithm, significantly enhancing its acceleration ratio. Furthermore, our proposed PDFMLC algorithm incorporates label mutual information from the established dataset as input features. This captures latent label associations, significantly improving classification accuracy. Finally, we present the PDFMLC-based Threat Intelligence Mining (PDFMLC-TIM) method. Experimental results demonstrate that the PDFMLC algorithm exhibits exceptional node scalability and execution efficiency. Simultaneously, the PDFMLC-TIM method proficiently conducts text classification on cybersecurity analysis reports, extracting tactics entities to construct comprehensive threat intelligence. As a result, successfully formatted STIX2.1 threat intelligence is established.
引用
收藏
页码:531 / 546
页数:16
相关论文
共 50 条
  • [31] A Parallel DistributedWeka Framework for Big Data Mining using Spark
    Koliopoulos, Aris-Kyriakos
    Yiapanis, Paraskevas
    Tekiner, Firat
    Nenadic, Goran
    Keane, John
    [J]. 2015 IEEE INTERNATIONAL CONGRESS ON BIG DATA - BIGDATA CONGRESS 2015, 2015, : 9 - 16
  • [32] Deep learning algorithm and location big data mining
    Gao Faqin
    [J]. PROCEEDINGS OF THE 2015 4TH INTERNATIONAL CONFERENCE ON COMPUTER, MECHATRONICS, CONTROL AND ELECTRONIC ENGINEERING (ICCMCEE 2015), 2015, 37 : 911 - 916
  • [33] APPLICATION OF DATA MINING ALGORITHM IN INTELLIGENCE ANALYSIS OF ENTERPRISE ECONOMIC INTELLIGENCE
    Wei, G. E.
    Gao, L.
    Shi, F.
    [J]. LATIN AMERICAN APPLIED RESEARCH, 2018, 48 (04) : 261 - 266
  • [34] Parallel and distributed clustering framework for big spatial data mining
    Bendechache, Malika
    Tari, A-Kamel
    Kechadi, M-Tahar
    [J]. INTERNATIONAL JOURNAL OF PARALLEL EMERGENT AND DISTRIBUTED SYSTEMS, 2019, 34 (06) : 671 - 689
  • [35] Parallel Computing Algorithms for Big Data Frequent Pattern Mining
    Shaik, Subhani
    Subhani, Shaik
    Devarakonda, Nagaraju
    Nagamani, Ch.
    [J]. PROCEEDINGS OF INTERNATIONAL CONFERENCE ON COMPUTATIONAL INTELLIGENCE AND DATA ENGINEERING, 2018, 9 : 113 - 123
  • [36] Business Intelligence Through Big Data Analytics, Data Mining and Machine Learning
    Yafooz, Wael M. S.
    Abu Bakar, Zainab Binti
    Fahad, S. K. Ahammad
    Mithun, Ahamed. M.
    [J]. DATA MANAGEMENT, ANALYTICS AND INNOVATION, ICDMAI 2019, VOL 2, 2020, 1016 : 217 - 230
  • [37] A clustering algorithm for data mining based on swarm intelligence
    Jin, Peng
    Zhu, Vun-Long
    Hu, Kun-Yuan
    [J]. PROCEEDINGS OF 2007 INTERNATIONAL CONFERENCE ON MACHINE LEARNING AND CYBERNETICS, VOLS 1-7, 2007, : 803 - 807
  • [38] APRA: An approximate parallel recommendation algorithm for Big Data
    Ait Hammou, Badr
    Ait Lahcen, Ayoub
    Mouline, Salma
    [J]. KNOWLEDGE-BASED SYSTEMS, 2018, 157 : 10 - 19
  • [39] A Parallel Clustering Algorithm for Power Big Data Analysis
    Meng, Xiangjun
    Chen, Liang
    Li, Yidong
    [J]. PARALLEL ARCHITECTURE, ALGORITHM AND PROGRAMMING, PAAP 2017, 2017, 729 : 533 - 540
  • [40] Mining Attributed Graphs for Threat Intelligence
    Gascon, Hugo
    Grobauer, Bernd
    Schreck, Thomas
    Rist, Lukas
    Arp, Daniel
    Rieck, Konrad
    [J]. PROCEEDINGS OF THE SEVENTH ACM CONFERENCE ON DATA AND APPLICATION SECURITY AND PRIVACY (CODASPY'17), 2017, : 15 - 22