FastEmbed: Predicting vulnerability exploitation possibility based on ensemble machine learning algorithm

被引:30
|
作者
Fang, Yong [1 ]
Liu, Yongcheng [1 ]
Huang, Cheng [1 ]
Liu, Liang [1 ]
机构
[1] Sichuan Univ, Coll Cybersecur, Chengdu, Sichuan, Peoples R China
来源
PLOS ONE | 2020年 / 15卷 / 02期
关键词
D O I
10.1371/journal.pone.0228439
中图分类号
O [数理科学和化学]; P [天文学、地球科学]; Q [生物科学]; N [自然科学总论];
学科分类号
07 ; 0710 ; 09 ;
摘要
In recent years, the number of vulnerabilities discovered and publicly disclosed has shown a sharp upward trend. However, the value of exploitation of vulnerabilities varies for attackers, considering that only a small fraction of vulnerabilities are exploited. Therefore, the realization of quick exclusion of the non-exploitable vulnerabilities and optimal patch prioritization on limited resources has become imperative for organizations. Recent works using machine learning techniques predict exploited vulnerabilities by extracting features from open-source intelligence (OSINT). However, in the face of explosive growth of vulnerability information, there is room for improvement in the application of past methods to multiple threat intelligence. A more general method is needed to deal with various threat intelligence sources. Moreover, in previous methods, traditional text processing methods were used to deal with vulnerability related descriptions, which only grasped the static statistical characteristics but ignored the context and the meaning of the words of the text. To address these challenges, we propose an exploit prediction model, which is based on a combination of fastText and LightGBM algorithm and called fastEmbed. We replicate key portions of the state-of-the-art work of exploit prediction and use them as benchmark models. Our model outperforms the baseline model whether in terms of the generalization ability or the prediction ability without temporal intermixing with an average overall improvement of 6.283% by learning the embedding of vulnerability-related text on extremely imbalanced data sets. Besides, in terms of predicting the exploits in the wild, our model also outperforms the baseline model with an F1 measure of 0.586 on the minority class (33.577% improvement over the work using features from darkweb/deepweb). The results demonstrate that the model can improve the ability to describe the exploitability of vulnerabilities and predict exploits in the wild effectively.
引用
收藏
页数:28
相关论文
共 50 条
  • [1] Predicting the Olea pollen concentration with a machine learning algorithm ensemble
    José María Cordero
    J. Rojo
    A. Montserrat Gutiérrez-Bustillo
    Adolfo Narros
    Rafael Borge
    International Journal of Biometeorology, 2021, 65 : 541 - 554
  • [2] Predicting the Olea pollen concentration with a machine learning algorithm ensemble
    Cordero, Jose Maria
    Rojo, J.
    Gutierrez-Bustillo, A. Montserrat
    Narros, Adolfo
    Borge, Rafael
    INTERNATIONAL JOURNAL OF BIOMETEOROLOGY, 2021, 65 (04) : 541 - 554
  • [3] Predicting Housing Price Based on Ensemble Learning Algorithm
    Tang, Yajuan
    Qiu, Shuang
    Gui, Pengcheng
    2018 INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND DATA PROCESSING (IDAP), 2018,
  • [4] Exploitation of Vulnerabilities: A Topic-Based Machine Learning Framework for Explaining and Predicting Exploitation
    Charmanas, Konstantinos
    Mittas, Nikolaos
    Angelis, Lefteris
    INFORMATION, 2023, 14 (07)
  • [5] A Novel Ensemble Machine Learning Algorithm for Predicting the Suitable Crop to Cultivate Based on Soil and Environment Characteristics
    Mariammal, G.
    Suruliandi, A.
    Stamenkovic, Z.
    Raja, S. P.
    IEEE CANADIAN JOURNAL OF ELECTRICAL AND COMPUTER ENGINEERING, 2024, 47 (03): : 127 - 135
  • [6] Predicting attributes based movie success through ensemble machine learning
    Gupta, Vedika
    Jain, Nikita
    Garg, Harshit
    Jhunthra, Srishti
    Mohan, Senthilkumar
    Omar, Abdullah Hisam
    Ahmadian, Ali
    MULTIMEDIA TOOLS AND APPLICATIONS, 2023, 82 (07) : 9597 - 9626
  • [7] Predicting attributes based movie success through ensemble machine learning
    Vedika Gupta
    Nikita Jain
    Harshit Garg
    Srishti Jhunthra
    Senthilkumar Mohan
    Abdullah Hisam Omar
    Ali Ahmadian
    Multimedia Tools and Applications, 2023, 82 : 9597 - 9626
  • [8] Predicting Cyber Vulnerability Exploits with Machine Learning
    Edkrantz, Michel
    Said, Alan
    THIRTEENTH SCANDINAVIAN CONFERENCE ON ARTIFICIAL INTELLIGENCE (SCAI 2015), 2015, 278 : 48 - 57
  • [9] Application of stacking ensemble machine learning algorithm in predicting the cost of highway construction projects
    Meharie, Meseret Getnet
    Mengesha, Wubshet Jekale
    Gariy, Zachary Abiero
    Mutuku, Raphael N. N.
    ENGINEERING CONSTRUCTION AND ARCHITECTURAL MANAGEMENT, 2022, 29 (07) : 2836 - 2853
  • [10] Dynamic ensemble-based machine learning models for predicting pest populations
    Singh, Ankit Kumar
    Yeasin, Md
    Paul, Ranjit Kumar
    Paul, A. K.
    Sarkar, Anita
    FRONTIERS IN APPLIED MATHEMATICS AND STATISTICS, 2024, 10