FastEmbed: Predicting vulnerability exploitation possibility based on ensemble machine learning algorithm

被引:30
|
作者
Fang, Yong [1 ]
Liu, Yongcheng [1 ]
Huang, Cheng [1 ]
Liu, Liang [1 ]
机构
[1] Sichuan Univ, Coll Cybersecur, Chengdu, Sichuan, Peoples R China
来源
PLOS ONE | 2020年 / 15卷 / 02期
关键词
D O I
10.1371/journal.pone.0228439
中图分类号
O [数理科学和化学]; P [天文学、地球科学]; Q [生物科学]; N [自然科学总论];
学科分类号
07 ; 0710 ; 09 ;
摘要
In recent years, the number of vulnerabilities discovered and publicly disclosed has shown a sharp upward trend. However, the value of exploitation of vulnerabilities varies for attackers, considering that only a small fraction of vulnerabilities are exploited. Therefore, the realization of quick exclusion of the non-exploitable vulnerabilities and optimal patch prioritization on limited resources has become imperative for organizations. Recent works using machine learning techniques predict exploited vulnerabilities by extracting features from open-source intelligence (OSINT). However, in the face of explosive growth of vulnerability information, there is room for improvement in the application of past methods to multiple threat intelligence. A more general method is needed to deal with various threat intelligence sources. Moreover, in previous methods, traditional text processing methods were used to deal with vulnerability related descriptions, which only grasped the static statistical characteristics but ignored the context and the meaning of the words of the text. To address these challenges, we propose an exploit prediction model, which is based on a combination of fastText and LightGBM algorithm and called fastEmbed. We replicate key portions of the state-of-the-art work of exploit prediction and use them as benchmark models. Our model outperforms the baseline model whether in terms of the generalization ability or the prediction ability without temporal intermixing with an average overall improvement of 6.283% by learning the embedding of vulnerability-related text on extremely imbalanced data sets. Besides, in terms of predicting the exploits in the wild, our model also outperforms the baseline model with an F1 measure of 0.586 on the minority class (33.577% improvement over the work using features from darkweb/deepweb). The results demonstrate that the model can improve the ability to describe the exploitability of vulnerabilities and predict exploits in the wild effectively.
引用
收藏
页数:28
相关论文
共 50 条
  • [41] Predicting facility-based delivery in Zanzibar: The vulnerability of machine learning algorithms to adversarial attacks
    Tsai, Yi-Ting
    Fulcher, Isabel R.
    Li, Tracey
    Sukums, Felix
    Hedt-Gauthier, Bethany
    HELIYON, 2023, 9 (05)
  • [42] Exploitation of the ensemble-based machine learning strategies to elevate the precision of CORDEX regional simulations in precipitation projection
    Alireza Ghaemi
    Seyed Arman Hashemi Monfared
    Abdolhamid Bahrpeyma
    Peyman Mahmoudi
    Mohammad Zounemat-Kermani
    Earth Science Informatics, 2024, 17 : 1373 - 1392
  • [43] Hybrid ensemble-based machine learning model for predicting phosphorus concentrations in hydroponic solution
    Sulaiman, Rozita
    Azeman, Nur Hidayah
    Mokhtar, Mohd Hadri Hafiz
    Mobarak, Nadhratun Naiim
    Bakar, Mohd Hafiz Abu
    Bakar, Ahmad Ashrif A.
    SPECTROCHIMICA ACTA PART A-MOLECULAR AND BIOMOLECULAR SPECTROSCOPY, 2024, 304
  • [44] Exploitation of the ensemble-based machine learning strategies to elevate the precision of CORDEX regional simulations in precipitation projection
    Ghaemi, Alireza
    Monfared, Seyed Arman Hashemi
    Bahrpeyma, Abdolhamid
    Mahmoudi, Peyman
    Zounemat-Kermani, Mohammad
    EARTH SCIENCE INFORMATICS, 2024, 17 (02) : 1373 - 1392
  • [45] Predicting thermodynamic stability of inorganic compounds using ensemble machine learning based on electron configuration
    Zou, Hao
    Zhao, Haochen
    Lu, Mingming
    Wang, Jiong
    Deng, Zeyu
    Wang, Jianxin
    NATURE COMMUNICATIONS, 2025, 16 (01)
  • [46] Ensemble Voting Regression Based on Machine Learning for Predicting Medical Waste: A Case from Turkey
    Erdebilli, Babek
    Devrim-Ictenbas, Burcu
    MATHEMATICS, 2022, 10 (14)
  • [47] Ensemble-Based Machine Learning for Predicting Sudden Human Fall Using Health Data
    Saxena, Utkarsh
    Moulik, Soumen
    Nayak, Soumya Ranjan
    Hanne, Thomas
    Roy, Diptendu Sinha
    MATHEMATICAL PROBLEMS IN ENGINEERING, 2021, 2021
  • [48] Predicting the hydrogen release ability of LiBH4-based mixtures by ensemble machine learning
    Ding, Zhao
    Chen, Zhiqian
    Ma, Tianyi
    Lu, Chang-Tien
    Ma, Wenhui
    Shaw, Leon
    ENERGY STORAGE MATERIALS, 2020, 27 : 466 - 477
  • [49] Ensemble Machine Learning-Based Approach for Predicting of FRP-Concrete Interfacial Bonding
    Kim, Bubryur
    Lee, Dong-Eun
    Hu, Gang
    Natarajan, Yuvaraj
    Preethaa, Sri
    Rathinakumar, Arun Pandian
    MATHEMATICS, 2022, 10 (02)
  • [50] Sequence-Based Predicting Bacterial Essential ncRNAs Algorithm by Machine Learning
    Ye, Yuan-Nong
    Liang, Ding-Fa
    Labena, Abraham Alemayehu
    Zeng, Zhu
    INTELLIGENT AUTOMATION AND SOFT COMPUTING, 2023, 36 (03): : 2731 - 2741