Estimating vulnerability metrics with word embedding and multiclass classification methods

被引:2
|
作者
Kekul, Hakan [1 ]
Ergen, Burhan [2 ]
Arslan, Halil [3 ]
机构
[1] Sivas Cumhuriyet Univ, Fac Technol, Dept Software Engn, Sivas, Turkiye
[2] Firat Univ, Fac Engn, Dept Comp Engn, Elazig, Turkiye
[3] Sivas Cumhuriyet Univ, Fac Engn, Dept Comp Engn, Sivas, Turkiye
关键词
Software security; Software vulnerability; Information security; Text analysis; Multiclass classification; TEXT CLASSIFICATION; SELECTION; IMPACT;
D O I
10.1007/s10207-023-00734-7
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Cyber security has an increasing importance since the day when information technologies are an invariable part of modern human life. One of the fundamental areas of cyber security is the concept of software security. Security vulnerabilities in software are one of the main reasons for the exploitation of information systems. For this reason, it has been systematically reported, analyzed and classified for a long time, with a protocol established between the states and the stakeholders of the issue at the level. All these processes are carried out manually by humans today. This situation causes errors and delays caused by human nature. Therefore, the current study aims to help the experts and increase the accuracy of the analysis results by speeding up the processes. To achieve this goal, a model is proposed that uses technical explanations of security reports written in natural language. Our model basically proposes a method that uses word embedding approaches and multi-class classification algorithms from natural language processing techniques. In order to compare the proposed model more accurately, the NVD database, which is open to everyone and accepted as a reference, was chosen. In addition, previous studies in the literature and the model we propose were compared. In order for the results of the compared models to be analyzed more accurately, our model was trained with the data sets of the studies it was compared and the results were presented clearly. The proposed method showed estimation success in the range of 87.34-96.25% for CVSS 2.0 metrics, and in the range of 84-90% for CVSS 3.1. This study, in which different word embedding and classification algorithms are used together, is one of the limited studies on the latest version of the official scoring system used for classification of software security vulnerabilities. Moreover, it is the most comprehensive and original study in its field due to the size of the dataset it uses and the number of databases evaluated.
引用
收藏
页码:247 / 270
页数:24
相关论文
共 50 条
  • [21] Multiclass credit cardholders' behaviors classification methods
    Kou, Gang
    Peng, Yi
    Shi, Yong
    Chen, Zhengxin
    COMPUTATIONAL SCIENCE - ICCS 2006, PT 4, PROCEEDINGS, 2006, 3994 : 485 - 492
  • [22] VCGERG : Vulnerability Classification With Graph Embedding Algorithm on Vulnerability Report Graphs
    Liu, Yashu
    Zhao, Xiaoyi
    Qiu, Xiaohua
    Yan, Han-Bing
    INTERNATIONAL JOURNAL OF INFORMATION SECURITY AND PRIVACY, 2024, 18 (01)
  • [23] Document Sentiment Classification based on the Word Embedding
    Yin, Yanping
    Jin, Zhong
    PROCEEDINGS OF THE 4TH INTERNATIONAL CONFERENCE ON MECHATRONICS, MATERIALS, CHEMISTRY AND COMPUTER ENGINEERING 2015 (ICMMCCE 2015), 2015, 39 : 456 - 461
  • [24] Automated Patent Classification Using Word Embedding
    Grawe, Mattyws F.
    Martins, Claudia A.
    Bonfante, Andreia G.
    2017 16TH IEEE INTERNATIONAL CONFERENCE ON MACHINE LEARNING AND APPLICATIONS (ICMLA), 2017, : 408 - 411
  • [25] Citation Intent Classification Using Word Embedding
    Roman, Muhammad
    Shahid, Abdul
    Khan, Shafiullah
    Koubaa, Anis
    Yu, Lisu
    IEEE ACCESS, 2021, 9 : 9982 - 9995
  • [26] Topic Classification Based on Improved Word Embedding
    Sheng, Liangliang
    Xu, Lizhen
    2017 14TH WEB INFORMATION SYSTEMS AND APPLICATIONS CONFERENCE (WISA 2017), 2017, : 117 - 121
  • [27] A Weighted Word Embedding Model for Text Classification
    Ren, Haopeng
    Zeng, ZeQuan
    Cai, Yi
    Du, Qing
    Li, Qing
    Xie, Haoran
    DATABASE SYSTEMS FOR ADVANCED APPLICATIONS (DASFAA 2019), PT I, 2019, 11446 : 419 - 434
  • [28] Multiclass Classification of Word Imagination Speech With Hybrid Connectivity Features
    Qureshi, Muhammad Naveed Iqbal
    Min, Beomjun
    Park, Hyeong-jun
    Cho, Dongrae
    Choi, Woosu
    Lee, Boreom
    IEEE TRANSACTIONS ON BIOMEDICAL ENGINEERING, 2018, 65 (10) : 2168 - 2177
  • [29] Using Word Embedding Methods for Product Recommendation
    Bayrak, Ahmet Tugrul
    Oner, Sultan Ceren
    Gencer, Mustafa
    Cerit, Onur Sahil
    Oymagil, Anil
    Dalva, Dogan
    2022 30TH SIGNAL PROCESSING AND COMMUNICATIONS APPLICATIONS CONFERENCE, SIU, 2022,
  • [30] Fuzzy Classification Metrics for Scanner Assessment and Vulnerability Reporting
    Loh, Peter Kok Keong
    Subramanian, Deepak
    IEEE TRANSACTIONS ON INFORMATION FORENSICS AND SECURITY, 2010, 5 (04) : 613 - 624