Estimating vulnerability metrics with word embedding and multiclass classification methods

被引:2
|
作者
Kekul, Hakan [1 ]
Ergen, Burhan [2 ]
Arslan, Halil [3 ]
机构
[1] Sivas Cumhuriyet Univ, Fac Technol, Dept Software Engn, Sivas, Turkiye
[2] Firat Univ, Fac Engn, Dept Comp Engn, Elazig, Turkiye
[3] Sivas Cumhuriyet Univ, Fac Engn, Dept Comp Engn, Sivas, Turkiye
关键词
Software security; Software vulnerability; Information security; Text analysis; Multiclass classification; TEXT CLASSIFICATION; SELECTION; IMPACT;
D O I
10.1007/s10207-023-00734-7
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Cyber security has an increasing importance since the day when information technologies are an invariable part of modern human life. One of the fundamental areas of cyber security is the concept of software security. Security vulnerabilities in software are one of the main reasons for the exploitation of information systems. For this reason, it has been systematically reported, analyzed and classified for a long time, with a protocol established between the states and the stakeholders of the issue at the level. All these processes are carried out manually by humans today. This situation causes errors and delays caused by human nature. Therefore, the current study aims to help the experts and increase the accuracy of the analysis results by speeding up the processes. To achieve this goal, a model is proposed that uses technical explanations of security reports written in natural language. Our model basically proposes a method that uses word embedding approaches and multi-class classification algorithms from natural language processing techniques. In order to compare the proposed model more accurately, the NVD database, which is open to everyone and accepted as a reference, was chosen. In addition, previous studies in the literature and the model we propose were compared. In order for the results of the compared models to be analyzed more accurately, our model was trained with the data sets of the studies it was compared and the results were presented clearly. The proposed method showed estimation success in the range of 87.34-96.25% for CVSS 2.0 metrics, and in the range of 84-90% for CVSS 3.1. This study, in which different word embedding and classification algorithms are used together, is one of the limited studies on the latest version of the official scoring system used for classification of software security vulnerabilities. Moreover, it is the most comprehensive and original study in its field due to the size of the dataset it uses and the number of databases evaluated.
引用
收藏
页码:247 / 270
页数:24
相关论文
共 50 条
  • [31] Applying the Multiclass Classification Methods for the Classification of Online Social Network Friends
    Sever, Nikolina
    Humski, Luka
    Ilic, Juraj
    Skocir, Zoran
    Pintar, Damir
    Vranic, Mihaela
    2017 25TH INTERNATIONAL CONFERENCE ON SOFTWARE, TELECOMMUNICATIONS AND COMPUTER NETWORKS (SOFTCOM), 2017, : 67 - 72
  • [32] Text classification with improved word embedding and adaptive segmentation
    Sun, Guoying
    Cheng, Yanan
    Zhang, Zhaoxin
    Tong, Xiaojun
    Chai, Tingting
    EXPERT SYSTEMS WITH APPLICATIONS, 2024, 238
  • [33] Emotion Classification on Youtube Comments using Word Embedding
    Savigny, Julio
    Purwarianti, Ayu
    2017 4TH INTERNATIONAL CONFERENCE ON ADVANCED INFORMATICS, CONCEPTS, THEORY, AND APPLICATIONS (ICAICTA) PROCEEDINGS, 2017,
  • [34] Inter project defect classification based on word embedding
    Kumar, Sushil
    Sharma, Meera
    Muttoo, S. K.
    Singh, V. B.
    INTERNATIONAL JOURNAL OF SYSTEM ASSURANCE ENGINEERING AND MANAGEMENT, 2024, 15 (02) : 621 - 634
  • [35] Sentiment-Aware Word Embedding for Emotion Classification
    Mao, Xingliang
    Chang, Shuai
    Shi, Jinjing
    Li, Fangfang
    Shi, Ronghua
    APPLIED SCIENCES-BASEL, 2019, 9 (07):
  • [36] Inter project defect classification based on word embedding
    Sushil Kumar
    Meera Sharma
    S. K. Muttoo
    V. B. Singh
    International Journal of System Assurance Engineering and Management, 2024, 15 : 621 - 634
  • [37] A ⟨word, part of speech⟩ embedding model for text classification
    Liu, Wenfeng
    Liu, Peiyu
    Yang, Yuzhen
    Yi, Jing
    Zhu, Zhenfang
    EXPERT SYSTEMS, 2019, 36 (06)
  • [38] An Experimental Analysis of Optimal Hybrid Word Embedding Methods for Text Classification Using a Movie Review Dataset
    Alagarsamy, Sandhya
    James, Visumathi
    Raja, Raja Soosaimarian Peter
    BRAZILIAN ARCHIVES OF BIOLOGY AND TECHNOLOGY, 2022, 65
  • [39] A Review of Classification Methods for Network Vulnerability
    Jin, Shuyuan
    Wang, Yong
    Cui, Xiang
    Yun, Xiaochun
    2009 IEEE INTERNATIONAL CONFERENCE ON SYSTEMS, MAN AND CYBERNETICS (SMC 2009), VOLS 1-9, 2009, : 1171 - 1175
  • [40] Comparative study of word embedding methods in topic segmentation
    Naili, Marwa
    Chaibi, Anja Habacha
    Ben Ghezala, Henda Hajjami
    KNOWLEDGE-BASED AND INTELLIGENT INFORMATION & ENGINEERING SYSTEMS, 2017, 112 : 340 - 349