Code Aggregate Graph: Effective Representation for Graph Neural Networks to Detect Vulnerable Code

被引:0
|
作者
Nguyen, Hoang Viet [1 ]
Zheng, Junjun [2 ]
Inomata, Atsuo [2 ]
Uehara, Tetsutaro [1 ]
机构
[1] Ritsumeikan University, College of Information Science and Engineering, Kusatsu,5258577, Japan
[2] Osaka University, Graduate School of Information Science and Technology, Osaka,5650871, Japan
关键词
D O I
暂无
中图分类号
学科分类号
摘要
Deep learning, especially graph neural networks (GNNs), provides efficient, fast, and automated methods to detect vulnerable code. However, the accuracy could be improved as previous studies were limited by existing code representations. Additionally, the diversity of embedding techniques and GNN models can make selecting the appropriate method challenging. Herein we propose Code Aggregate Graph (CAG) to improve vulnerability detection efficiency. CAG combines the principles of different code analyses such as abstract syntax tree, control flow graph, and program dependence graph with dominator and post-dominator trees. This extensive representation empowers deep graph networks for enhanced classification. We also implement different data encoding methods and neural networks to provide a multidimensional view of the system performance. Specifically, three word embedding approaches and three deep GNNs are utilized to build classifiers. Then CAG is evaluated using two datasets: a real-world open-source dataset and the software assurance reference dataset. CAG is also compared with seven state-of-the-art methods and six classic representations. CAG shows the best performance. Compared to previous studies, CAG has an increased accuracy (5.4%) and F1-score (5.1%). Additionally, experiments confirm that encoding has a positive impact on accuracy (4-6%) but the network type does not. The study should contribute to a meaningful benchmark for future research on code representations, data encoding, and GNNs. © 2013 IEEE.
引用
收藏
页码:123786 / 123800
相关论文
共 50 条
  • [21] Graph-based learning for automated code checking - Exploring the application of graph neural networks for design review
    Bloch, Tanya
    Borrmann, Andre
    Pauwels, Pieter
    ADVANCED ENGINEERING INFORMATICS, 2023, 58
  • [22] Graph Classification with Minimum DFS Code: Improving Graph Neural Network Expressivity
    Gupta, Jhalak
    Khan, Arijit
    2021 IEEE INTERNATIONAL CONFERENCE ON BIG DATA (BIG DATA), 2021, : 5133 - 5142
  • [23] Graph Neural Network for Source Code Defect Prediction
    Sikic, Lucija
    Kurdija, Adrian Satja
    Vladimir, Klemo
    Silic, Marin
    IEEE Access, 2022, 10 : 10402 - 10415
  • [24] Embedding API dependency graph for neural code generation
    Lyu, Chen
    Wang, Ruyun
    Zhang, Hongyu
    Zhang, Hanwen
    Hu, Songlin
    arXiv, 2021,
  • [25] Embedding API dependency graph for neural code generation
    Lyu, Chen
    Wang, Ruyun
    Zhang, Hongyu
    Zhang, Hanwen
    Hu, Songlin
    EMPIRICAL SOFTWARE ENGINEERING, 2021, 26 (04)
  • [26] Improved Code Summarization via a Graph Neural Network
    LeClair, Alexander
    Haque, Sakib
    Wu, Lingfei
    McMillan, Collin
    2020 IEEE/ACM 28TH INTERNATIONAL CONFERENCE ON PROGRAM COMPREHENSION, ICPC, 2020, : 184 - 195
  • [27] Towards a Knowledge Graph Based Approach for Vulnerable Code Weaknesses Identification
    Reane, Martina Vecellio
    Dall'Anese, Daniele
    Foulefack, Rosmael Z. L.
    Marchetto, Alessandro
    TESTING SOFTWARE AND SYSTEMS, ICTSS 2024, 2025, 15383 : 159 - 166
  • [28] Embedding API dependency graph for neural code generation
    Chen Lyu
    Ruyun Wang
    Hongyu Zhang
    Hanwen Zhang
    Songlin Hu
    Empirical Software Engineering, 2021, 26
  • [29] Graph Neural Network for Source Code Defect Prediction
    Sikic, Lucija
    Kurdija, Adrian Satja
    Vladimir, Klemo
    Silic, Marin
    IEEE ACCESS, 2022, 10 : 10402 - 10415
  • [30] ON THE RELATION BETWEEN A GRAPH CODE AND A GRAPH STATE
    Hwang, Yongsoo
    Heo, Jun
    QUANTUM INFORMATION & COMPUTATION, 2016, 16 (3-4) : 237 - 250