Vulnerability Detection via Multiple-Graph-Based Code Representation

被引:0
|
作者
Qiu, Fangcheng [1 ]
Liu, Zhongxin [1 ]
Hu, Xing [2 ]
Xia, Xin [3 ]
Chen, Gang [4 ]
Wang, Xinyu [4 ]
机构
[1] Zhejiang Univ, State Key Lab Blockchain & Data Secur, Hangzhou 310027, Zhejiang, Peoples R China
[2] Zhejiang Univ, Sch Software Technol, Ningbo 315103, Zhejiang, Peoples R China
[3] Huawei, Software Engn Applicat Technol Lab, Hangzhou 310051, Zhejiang, Peoples R China
[4] Zhejiang Univ, Coll Comp Sci & Technol, Hangzhou 310027, Zhejiang, Peoples R China
基金
中国国家自然科学基金;
关键词
Semantics; Codes; Source coding; Graph neural networks; Software; Feature extraction; Deep learning; Vulnerability detection; deep learning; code representation; graph neural network;
D O I
10.1109/TSE.2024.3427815
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
During software development and maintenance, vulnerability detection is an essential part of software quality assurance. Even though many program-analysis-based and machine-learning-based approaches have been proposed to automatically detect vulnerabilities, they rely on explicit rules or patterns defined by security experts and suffer from either high false positives or high false negatives. Recently, an increasing number of studies leverage deep learning techniques, especially Graph Neural Network (GNN), to detect vulnerabilities. These approaches leverage program analysis to represent the program semantics as graphs and perform graph analysis to detect vulnerabilities. However, they suffer from two main problems: (i) Existing GNN-based techniques do not effectively learn the structural and semantic features from source code for vulnerability detection. (ii) These approaches tend to ignore fine-grained information in source code. To tackle these problems, in this paper, we propose a novel vulnerability detection approach, named MGVD (MULTIPLE-GRAPH-BASED VULNERABILITY DETECTION), to detect vulnerable functions. To effectively learn the structural and semantic features from source code, MGVD uses three different ways to represent each function into multiple forms, i.e., two statement graphs and a sequence of tokens. Then we encode such representations to a three-channel feature matrix. The feature matrix contains the structural feature and the semantic feature of the function. And we add a weight allocation layer to distribute the weights between structural and semantic features. To overcome the second problem, MGVD constructs each graph representation of the input function using multiple different graphs instead of a single graph. Each graph focuses on one statement in the function and its nodes denote the related statements and their fine-grained code elements. Finally, MGVD leverages CNN to identify whether this function is vulnerable based on such feature matrix. We conduct experiments on 3 vulnerability datasets with a total of 30,341 vulnerable functions and 127,931 non-vulnerable functions. The experimental results show that our method outperforms the state-of-the-art by 9.68% - 10.28% in terms of F1-score.
引用
收藏
页码:2178 / 2199
页数:22
相关论文
共 50 条
  • [1] Improving Vulnerability Detection with Hybrid Code Graph Representation
    Meng, Xiangxin
    Lu, Shaoxiao
    Wang, Xu
    Liu, Xudong
    Hu, Chunming
    [J]. PROCEEDINGS OF THE 2023 30TH ASIA-PACIFIC SOFTWARE ENGINEERING CONFERENCE, APSEC 2023, 2023, : 259 - 268
  • [2] Source Code Vulnerability Detection Using Vulnerability Dependency Representation Graph
    Yang, Hongyu
    Yang, Haiyun
    Zhang, Liang
    Cheng, Xiang
    [J]. 2022 IEEE INTERNATIONAL CONFERENCE ON TRUST, SECURITY AND PRIVACY IN COMPUTING AND COMMUNICATIONS, TRUSTCOM, 2022, : 457 - 464
  • [3] Graph-based Vulnerability Detection via Extracting Features from Sliced Code
    Wu, Peng
    Yin, Liangze
    Du, Xiang
    Jia, Liyuan
    Dong, Wei
    [J]. COMPANION OF THE 2020 IEEE 20TH INTERNATIONAL CONFERENCE ON SOFTWARE QUALITY, RELIABILITY, AND SECURITY (QRS-C 2020), 2020, : 38 - 45
  • [4] Vulnerability Detection Based on Enhanced Graph Representation Learning
    Xiao, Peng
    Xiao, Qibin
    Zhang, Xusheng
    Wu, Yumei
    Yang, Fengyu
    [J]. IEEE TRANSACTIONS ON INFORMATION FORENSICS AND SECURITY, 2024, 19 : 5120 - 5135
  • [5] Vulnerability Detection in C/C plus plus Source Code With Graph Representation Learning
    Wu, Yuelong
    Lu, Jintian
    Zhang, Yunyi
    Jin, Shuyuan
    [J]. 2021 IEEE 11TH ANNUAL COMPUTING AND COMMUNICATION WORKSHOP AND CONFERENCE (CCWC), 2021, : 1519 - 1524
  • [6] Graph representation learning and software homology matching based A study of JAVA']JAVA code vulnerability detection techniques
    Yang, Yibin
    Bo, Xin
    Wang, Zitong
    Shao, Xinrui
    Xie, Xinjie
    [J]. 2023 2ND ASIA CONFERENCE ON ALGORITHMS, COMPUTING AND MACHINE LEARNING, CACML 2023, 2023, : 131 - 142
  • [7] Code Property Graph based Cross-Domain Vulnerability Detection via Deep Fused Feature
    Du, Gewangzi
    Chen, Liwei
    Wu, Tongshuai
    Zheng, Xiong
    Shi, Gang
    [J]. 2024 IEEE INTERNATIONAL SYMPOSIUM ON CIRCUITS AND SYSTEMS, ISCAS 2024, 2024,
  • [8] Code Vulnerability Detection Based on Deep Sequence and Graph Models: A Survey
    Wu, Bolun
    Zou, Futai
    [J]. SECURITY AND COMMUNICATION NETWORKS, 2022, 2022
  • [9] VDoTR: Vulnerability detection based on tensor representation of comprehensive code graphs
    Fan, Yuanhai
    Wan, Chuanhao
    Fu, Cai
    Han, Lansheng
    Xu, Hao
    [J]. COMPUTERS & SECURITY, 2023, 130
  • [10] VulSPG: Vulnerability detection based on slice property graph representation learning
    Zheng, Weining
    Jiang, Yuan
    Su, Xiaohong
    [J]. 2021 IEEE 32ND INTERNATIONAL SYMPOSIUM ON SOFTWARE RELIABILITY ENGINEERING (ISSRE 2021), 2021, : 457 - 467