CSGVD: A deep learning approach combining sequence and graph embedding for source code vulnerability detection

被引:14
|
作者
Tang, Wei [1 ]
Tang, Mingwei [1 ]
Ban, Minchao [1 ]
Zhao, Ziguo [1 ]
Feng, Mingjun [2 ]
机构
[1] Xihua Univ, Sch Comp & Software Engn, Chengdu 610039, Sichuan, Peoples R China
[2] State Grid Tibet Elect Power Co Ltd, Beijing, Peoples R China
基金
中国国家自然科学基金;
关键词
Graph neural networks; Vulnerability detection; Sequence embedding; Graph embedding; Pre -trained language model; Attention pooling; NEURAL-NETWORKS;
D O I
10.1016/j.jss.2023.111623
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
In order to secure software, it is critical to detect potential vulnerabilities. The performance of traditional static vulnerability detection methods is limited by predefined rules, which rely heavily on the expertise of developers. Existing deep learning-based vulnerability detection models usu-ally use only a single sequence or graph embedding approach to extract vulnerability features. Sequence embedding-based models ignore the structured information inherent in the code, and graph embedding-based models lack effective node and graph embedding methods. As a result, we propose a novel deep learning-based approach, CSGVD (Combining Sequence and Graph embedding for Vulnerability Detection), which considers function-level vulnerability detection as a graph binary classification task. Firstly, we propose a PE-BL module, which inherits and enhances the knowledge from the pre-trained language model. It extracts the code's local semantic features as node embedding in the control flow graph by using sequence embedding. Secondly, CSGVD uses graph neural networks to extract the structured information of the graph. Finally, we propose a mean biaffine attention pool-ing, M-BFA, to better aggregate node information as a graph's feature representation. The experimental results show that CSGVD outperforms the existing state-of-the-art models and obtains 64.46% accuracy on the real-world benchmark dataset from CodeXGLUE for vulnerability detection.(c) 2023 Elsevier Inc. All rights reserved.
引用
收藏
页数:11
相关论文
共 50 条
  • [1] Code Vulnerability Detection Based on Deep Sequence and Graph Models: A Survey
    Wu, Bolun
    Zou, Futai
    [J]. SECURITY AND COMMUNICATION NETWORKS, 2022, 2022
  • [2] Automated Vulnerability Detection in Source Code Using Deep Representation Learning
    Russell, Rebecca L.
    Kim, Louis
    Hamilton, Lei H.
    Lazovich, Tomo
    Harer, Jacob A.
    Ozdemir, Onur
    Ellingwood, Paul M.
    McConley, Marc W.
    [J]. 2018 17TH IEEE INTERNATIONAL CONFERENCE ON MACHINE LEARNING AND APPLICATIONS (ICMLA), 2018, : 757 - 762
  • [3] Combining Graph-Based Learning with Automated Data Collection for Code Vulnerability Detection
    Wang, Huanting
    Ye, Guixin
    Tang, Zhanyong
    Tan, Shin Hwei
    Huang, Songfang
    Fang, Dingyi
    Feng, Yansong
    Bian, Lizhong
    Wang, Zheng
    [J]. IEEE Transactions on Information Forensics and Security, 2021, 16 : 1943 - 1958
  • [4] Combining Graph-Based Learning With Automated Data Collection for Code Vulnerability Detection
    Wang, Huanting
    Ye, Guixin
    Tang, Zhanyong
    Tan, Shin Hwei
    Huang, Songfang
    Fang, Dingyi
    Feng, Yansong
    Bian, Lizhong
    Wang, Zheng
    [J]. IEEE TRANSACTIONS ON INFORMATION FORENSICS AND SECURITY, 2021, 16 : 1943 - 1958
  • [5] Approach to Searching Software Source Code with Graph Embedding
    Ling, Chun-Yang
    Zou, Yan-Zhen
    Lin, Ze-Qi
    Xie, Bing
    Zhao, Jun-Feng
    [J]. Ruan Jian Xue Bao/Journal of Software, 2019, 30 (05): : 1481 - 1497
  • [6] Source Code Vulnerability Detection Using Vulnerability Dependency Representation Graph
    Yang, Hongyu
    Yang, Haiyun
    Zhang, Liang
    Cheng, Xiang
    [J]. 2022 IEEE INTERNATIONAL CONFERENCE ON TRUST, SECURITY AND PRIVACY IN COMPUTING AND COMMUNICATIONS, TRUSTCOM, 2022, : 457 - 464
  • [7] An Empirical Study on Vulnerability Detection for Source Code Software based on Deep Learning
    Lin, Wei
    Cai, Saihua
    [J]. 2021 21ST INTERNATIONAL CONFERENCE ON SOFTWARE QUALITY, RELIABILITY AND SECURITY COMPANION (QRS-C 2021), 2021, : 1159 - 1160
  • [8] Vulnerability Detection in C/C plus plus Source Code With Graph Representation Learning
    Wu, Yuelong
    Lu, Jintian
    Zhang, Yunyi
    Jin, Shuyuan
    [J]. 2021 IEEE 11TH ANNUAL COMPUTING AND COMMUNICATION WORKSHOP AND CONFERENCE (CCWC), 2021, : 1519 - 1524
  • [9] An Unbiased Transformer Source Code Learning with Semantic Vulnerability Graph
    Islam, Nafis Tanveer
    Parra, Gonzalo De La Torre
    Manuel, Dylan
    Bou-Harb, Elias
    Najafirad, Peyman
    [J]. 2023 IEEE 8TH EUROPEAN SYMPOSIUM ON SECURITY AND PRIVACY, EUROS&P, 2023, : 144 - 159
  • [10] DiverseVul: A New Vulnerable Source Code Dataset for Deep Learning Based Vulnerability Detection
    Chen, Yizheng
    Ding, Zhoujie
    Alowain, Lamya
    Chen, Xinyun
    Wagner, David
    [J]. PROCEEDINGS OF THE 26TH INTERNATIONAL SYMPOSIUM ON RESEARCH IN ATTACKS, INTRUSIONS AND DEFENSES, RAID 2023, 2023, : 654 - 668