Estimating vulnerability metrics with word embedding and multiclass classification methods

被引:2
|
作者
Kekul, Hakan [1 ]
Ergen, Burhan [2 ]
Arslan, Halil [3 ]
机构
[1] Sivas Cumhuriyet Univ, Fac Technol, Dept Software Engn, Sivas, Turkiye
[2] Firat Univ, Fac Engn, Dept Comp Engn, Elazig, Turkiye
[3] Sivas Cumhuriyet Univ, Fac Engn, Dept Comp Engn, Sivas, Turkiye
关键词
Software security; Software vulnerability; Information security; Text analysis; Multiclass classification; TEXT CLASSIFICATION; SELECTION; IMPACT;
D O I
10.1007/s10207-023-00734-7
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Cyber security has an increasing importance since the day when information technologies are an invariable part of modern human life. One of the fundamental areas of cyber security is the concept of software security. Security vulnerabilities in software are one of the main reasons for the exploitation of information systems. For this reason, it has been systematically reported, analyzed and classified for a long time, with a protocol established between the states and the stakeholders of the issue at the level. All these processes are carried out manually by humans today. This situation causes errors and delays caused by human nature. Therefore, the current study aims to help the experts and increase the accuracy of the analysis results by speeding up the processes. To achieve this goal, a model is proposed that uses technical explanations of security reports written in natural language. Our model basically proposes a method that uses word embedding approaches and multi-class classification algorithms from natural language processing techniques. In order to compare the proposed model more accurately, the NVD database, which is open to everyone and accepted as a reference, was chosen. In addition, previous studies in the literature and the model we propose were compared. In order for the results of the compared models to be analyzed more accurately, our model was trained with the data sets of the studies it was compared and the results were presented clearly. The proposed method showed estimation success in the range of 87.34-96.25% for CVSS 2.0 metrics, and in the range of 84-90% for CVSS 3.1. This study, in which different word embedding and classification algorithms are used together, is one of the limited studies on the latest version of the official scoring system used for classification of software security vulnerabilities. Moreover, it is the most comprehensive and original study in its field due to the size of the dataset it uses and the number of databases evaluated.
引用
收藏
页码:247 / 270
页数:24
相关论文
共 50 条
  • [41] Evaluating word embedding models: methods and experimental results
    Wang, Bin
    Wang, Angela
    Chen, Fenxiao
    Wang, Yuncheng
    Kuo, C-C Jay
    APSIPA TRANSACTIONS ON SIGNAL AND INFORMATION PROCESSING, 2019, 8
  • [42] Identify Vulnerability Types: A Cross-Project Multiclass Vulnerability Classification System Based on Deep Domain Adaptation
    Du, Gewangzi
    Chen, Liwei
    Wu, Tongshuai
    Zhu, Chenguang
    Shi, Gang
    NEURAL INFORMATION PROCESSING, ICONIP 2023, PT VI, 2024, 14452 : 481 - 499
  • [43] DeepPatent: patent classification with convolutional neural networks and word embedding
    Li, Shaobo
    Hu, Jie
    Cui, Yuxin
    Hu, Jianjun
    SCIENTOMETRICS, 2018, 117 (02) : 721 - 744
  • [44] Ontology Alignment Based on Word Embedding and Random Forest Classification
    Nkisi-Orji, Ikechukwu
    Wiratunga, Nirmalie
    Massie, Stewart
    Hui, Kit-Ying
    Heaven, Rachel
    MACHINE LEARNING AND KNOWLEDGE DISCOVERY IN DATABASES, ECML PKDD 2018, PT I, 2019, 11051 : 557 - 572
  • [45] Word Embedding Composition for Data Imbalances in Sentiment and Emotion Classification
    Xu, Ruifeng
    Chen, Tao
    Xia, Yunqing
    Lu, Qin
    Liu, Bin
    Wang, Xuan
    COGNITIVE COMPUTATION, 2015, 7 (02) : 226 - 240
  • [46] Word Embedding Composition for Data Imbalances in Sentiment and Emotion Classification
    Ruifeng Xu
    Tao Chen
    Yunqing Xia
    Qin Lu
    Bin Liu
    Xuan Wang
    Cognitive Computation, 2015, 7 : 226 - 240
  • [47] Text Sentiment Polarity Classification Method Based on Word Embedding
    Sun, Xiaojie
    Du, Menghao
    Shi, Hua
    Huang, Wenming
    PROCEEDINGS OF THE 2018 2ND INTERNATIONAL CONFERENCE ON ALGORITHMS, COMPUTING AND SYSTEMS (ICACS 2018), 2018, : 99 - 104
  • [48] WEFEST: Word Embedding Feature Extension for Short Text Classification
    Sang, Lei
    Xie, Fei
    Liu, Xiaojian
    Wu, Xindong
    2016 IEEE 16TH INTERNATIONAL CONFERENCE ON DATA MINING WORKSHOPS (ICDMW), 2016, : 677 - 683
  • [49] Dynamically Jointing character and word embedding for Chinese text Classification
    Tang, Xuetao
    Hu, Xuegang
    Li, Peipei
    11TH IEEE INTERNATIONAL CONFERENCE ON KNOWLEDGE GRAPH (ICKG 2020), 2020, : 336 - 343
  • [50] Feature Expansion using Word Embedding for Tweet Topic Classification
    Setiawan, Erwin B.
    Widyantoro, Dwi H.
    Surendro, Kridanto
    2016 10TH INTERNATIONAL CONFERENCE ON TELECOMMUNICATION SYSTEMS SERVICES AND APPLICATIONS (TSSA), 2016,