A cosine similarity-based labeling technique for vulnerability type detection using source codes

被引:0
|
作者
Ozturk, M. Maruf [1 ]
机构
[1] Suleyman Demirel Univ, Engn & Nat Sci Fac, Dept Comp Engn, Isparta, Turkiye
关键词
Vulnerability detection; Cosine similarity; Generalized linear model; Labeling; Text encoding;
D O I
10.1016/j.cose.2024.104059
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Vulnerability detection is of great importance in providing reliability to software systems. Although existing methods achieve remarkable success in vulnerability detection, they have several disadvantages as follows: (1) The irrelevant information is removed from source codes, which have a high noise ratio, thereby utilizing deep learning methods and devising experiments featuring high accuracy. However, deep learning-based detection methods necessitate large-scale datasets. This results in computational hardship with respect to vulnerability detection in small-scale software systems. (2) The majority of the studies perform feature selection by processing vulnerability commits. Despite tremendous endeavors, there are few works detecting vulnerability with source codes. To solve these two problems, in this study, a novel labeling and vulnerability detection algorithm is proposed. The algorithm first exploits source codes with the help of a keyword vulnerability matrix. After that, an ultimate encoded matrix is generated by word2vec, thereby combining the labeling vector with the source code matrix to reveal a trainable dataset for a generalized linear model (GLM). Different from preceding studies, our method performs vulnerability detection without requiring vulnerability commits but using source codes. In addition to this, similar studies generally aim to bring sophisticated solutions for just one type of programming language. Conversely, our study develops vulnerability keywords for three programming languages including C#, Java, and C++, and creates the related labeling vectors by regarding the keyword matrix. The proposed method outperformed the baseline approaches for most of the experimental datasets with over 90% of the area under the curve (AUC). Further, there is a 7.7% margin between our method and the alternatives on average for Recall, Precision, and F1-score with respect to five types of vulnerabilities.
引用
收藏
页数:13
相关论文
共 50 条
  • [1] A Similarity-Based PMU Error Detection Technique
    Idehen, Ikponmwosa
    Overbye, Thomas
    [J]. 2017 19TH INTERNATIONAL CONFERENCE ON INTELLIGENT SYSTEM APPLICATION TO POWER SYSTEMS (ISAP), 2017,
  • [2] Cosine Similarity-Based Pruning for Concept Discovery
    Dogan, Abdullah
    Mutlu, Alev
    Karagoz, Pinar
    [J]. COMPUTER AND INFORMATION SCIENCES, ISCIS 2016, 2016, 659 : 90 - 96
  • [3] Efficient Cosine Similarity-based Image Correlation Algorithm for Object Detection and Localization
    Qin, Shun
    Shao, Hang
    Wang, Zongyu
    Shi, Kejun
    Gao, Chengtao
    Zhang, Jifeng
    [J]. OPTOELECTRONIC IMAGING AND MULTIMEDIA TECHNOLOGY IX, 2022, 12317
  • [4] A Cosine Similarity-based Compensation Strategy for RSS Detection Variance in Indoor Localization
    Wang, Lei
    Wu, Xiao
    Zheng, Baoyu
    Cu, Jingwu
    Zhou, Hui
    [J]. 2016 26TH INTERNATIONAL TELECOMMUNICATION NETWORKS AND APPLICATIONS CONFERENCE (ITNAC), 2016, : 47 - 52
  • [5] A cosine similarity-based negative selection algorithm for time series novelty detection
    Dong, Yonggui
    Sun, Zhaoyan
    Ha, Huibo
    [J]. MECHANICAL SYSTEMS AND SIGNAL PROCESSING, 2006, 20 (06) : 1461 - 1472
  • [6] Evaluation of a similarity-based elastography technique using four similarity metrics
    Rothney, MP
    Washington, CW
    Miga, MI
    [J]. 2004 2ND IEEE INTERNATIONAL SYMPOSIUM ON BIOMEDICAL IMAGING: MACRO TO NANO, VOLS 1 AND 2, 2004, : 696 - 699
  • [7] A Cosine Similarity-Based Centralized Protection Scheme for dc Microgrids
    Mohanty, Rabindra
    Sahoo, Subham
    Pradhan, Ashok Kumar
    Blaabjerg, Frede
    [J]. IEEE JOURNAL OF EMERGING AND SELECTED TOPICS IN POWER ELECTRONICS, 2021, 9 (05) : 5646 - 5656
  • [8] A signature technique for similarity-based queries
    Faloutsos, C
    Jagadish, HV
    Mendelzon, AO
    Milo, T
    [J]. COMPRESSION AND COMPLEXITY OF SEQUENCES 1997 - PROCEEDINGS, 1998, : 2 - 20
  • [9] A similarity-based remaining useful life prediction method using multimodal degradation features and adjusted cosine similarity
    Kong, Chengcheng
    Yu, Wennian
    Zeng, Qiang
    Chen, Zixu
    Peng, Yizhen
    [J]. MEASUREMENT SCIENCE AND TECHNOLOGY, 2023, 34 (10)
  • [10] A cosine similarity-based corporate default prediction model and empirical evidence
    Shen, Long
    Zhou, Ying
    Zhao, Xuanduo
    [J]. Xitong Gongcheng Lilun yu Shijian/System Engineering Theory and Practice, 2022, 42 (07): : 1826 - 1842