Fine-grained Commit-level Vulnerability Type Prediction by CWE Tree Structure

被引:10
|
作者
Pan, Shengyi [1 ]
Bao, Lingfeng [1 ]
Xia, Xin [2 ]
Lo, David [3 ]
Li, Shanping [1 ]
机构
[1] Zhejiang Univ, Coll Comp Sci & Technol, Hangzhou, Peoples R China
[2] Huawei, Shenzhen, Peoples R China
[3] Singapore Management Univ, Sch Informat Syst, Singapore, Singapore
基金
美国国家科学基金会; 新加坡国家研究基金会;
关键词
Software Security; Vulnerability Type; CWE; CLASSIFICATION;
D O I
10.1109/ICSE48619.2023.00088
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
Identifying security patches via code commits to allow early warnings and timely fixes for Open Source Software (OSS) has received increasing attention. However, the existing detection methods can only identify the presence of a patch (i.e., a binary classification) but fail to pinpoint the vulnerability type. In this work, we take the first step to categorize the security patches into fine-grained vulnerability types. Specifically, we use the Common Weakness Enumeration (CWE) as the label and perform fine-grained classification using categories at the third level of the CWE tree. We first formulate the task as a Hierarchical Multi-label Classification (HMC) problem, i.e., inferring a path (a sequence of CWE nodes) from the root of the CWE tree to the node at the target depth. We then propose an approach named TREEVUL with a hierarchical and chained architecture, which manages to utilize the structure information of the CWE tree as prior knowledge of the classification task. We further propose a tree structure aware and beam search based inference algorithm for retrieving the optimal path with the highest merged probability. We collect a large security patch dataset from NVD, consisting of 6,541 commits from 1,560 GitHub OSS repositories. Experimental results show that TREEVUL significantly outperforms the best performing baselines, with improvements of 5.9%, 25.0%, and 7.7% in terms of weighted F1-score, macro F1-score, and MCC, respectively. We further conduct a user study and a case study to verify the practical value of TREEVUL in enriching the binary patch detection results and improving the data quality of NVD, respectively.
引用
收藏
页码:957 / 969
页数:13
相关论文
共 50 条
  • [41] Corpus-Level Fine-Grained Entity Typing
    Yaghoobzadeh, Yadollah
    Adel, Heike
    Schuetze, Hinrich
    JOURNAL OF ARTIFICIAL INTELLIGENCE RESEARCH, 2018, 61 : 835 - 862
  • [42] Bug Prediction Based on Fine-Grained Module Histories
    Hata, Hideaki
    Mizuno, Osamu
    Kikuno, Tohru
    2012 34TH INTERNATIONAL CONFERENCE ON SOFTWARE ENGINEERING (ICSE), 2012, : 200 - 210
  • [43] Fine-grained Adaptive Testing Based on Quality Prediction
    Liu, Mengyun
    Pan, Renjian
    Ye, Fangming
    Li, Xin
    Chakrabarty, Krishnendu
    Gu, Xinli
    ACM TRANSACTIONS ON DESIGN AUTOMATION OF ELECTRONIC SYSTEMS, 2020, 25 (05)
  • [44] Protein fold prediction in the context of fine-grained classifications
    Dubchak, I
    WORLD MULTICONFERENCE ON SYSTEMICS, CYBERNETICS AND INFORMATICS, VOL 4, PROCEEDINGS, 1998, : 262 - 262
  • [45] Fine-grained Tree-to-String Translation Rule Extraction
    Wu, Xianchao
    Matsuzaki, Takuya
    Tsujii, Jun'ichi
    ACL 2010: 48TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, 2010, : 325 - 334
  • [46] Fine-grained Parallel Application Specific Computing for RNA Secondary Structure Prediction on FPGA
    Dou, Yong
    Xia, Fei
    Zhou, Xingming
    Yang, Xuejun
    2008 IEEE INTERNATIONAL CONFERENCE ON COMPUTER DESIGN, 2008, : 240 - 247
  • [47] FINE-GRAINED PARALLEL APPLICATION SPECIFIC COMPUTING FOR RNA SECONDARY STRUCTURE PREDICTION ON FPGA
    Zhu, Qianghua
    Xia, Fei
    Jin, Guoqing
    JOURNAL OF CIRCUITS SYSTEMS AND COMPUTERS, 2014, 23 (03)
  • [48] Fine-grained parallelism accelerating for RNA secondary structure prediction with pseudoknots based on FPGA
    Xia, Fei
    Jin, Guoqing
    JOURNAL OF BIOINFORMATICS AND COMPUTATIONAL BIOLOGY, 2014, 12 (03)
  • [49] CorED: Incorporating Type-level and Instance-level Correlations for Fine-grained Event Detection
    Sheng, Jiawei
    Sun, Rui
    Guo, Shu
    Cui, Shiyao
    Cao, Jiangxia
    Wang, Lihong
    Liu, Tingwen
    Xu, Hongbo
    PROCEEDINGS OF THE 45TH INTERNATIONAL ACM SIGIR CONFERENCE ON RESEARCH AND DEVELOPMENT IN INFORMATION RETRIEVAL (SIGIR '22), 2022, : 1122 - 1132
  • [50] GraphFVD: Property graph-based fine-grained vulnerability detection
    Shao, Miaomiao
    Ding, Yuxin
    Cao, Jing
    Li, Yilin
    COMPUTERS & SECURITY, 2025, 151