Learning Semantic Program Embeddings with Graph Interval Neural Network

被引:37
|
作者
Wang, Yu [1 ]
Wang, Ke [2 ]
Gao, Fengjuan [1 ]
Wang, Linzhang [1 ]
机构
[1] Nanjing Univ, Dept Comp Sci & Technol, State Key Lab Novel Software Technol, Nanjing 210023, Jiangsu, Peoples R China
[2] Visa Res, Secur Cryptog & Blockchain, Palo Alto, CA USA
基金
中国国家自然科学基金;
关键词
Program embeddings; Control-flow graphs; Intervals; Graph neural networks; Null pointer dereference detection;
D O I
10.1145/3428205
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
Learning distributed representations of source code has been a challenging task for machine learning models. Earlier works treated programs as text so that natural language methods can be readily applied. Unfortunately, such approaches do not capitalize on the rich structural information possessed by source code. Of late, Graph Neural Network (GNN) was proposed to learn embeddings of programs from their graph representations. Due to the homogeneous (i.e. do not take advantage of the program-specific graph characteristics) and expensive (i.e. require heavy information exchange among nodes in the graph) message-passing procedure, GNN can suffer from precision issues, especially when dealing with programs rendered into large graphs. In this paper, we present a new graph neural architecture, called Graph Interval Neural Network (GINN), to tackle the weaknesses of the existing GNN. Unlike the standard GNN, GINN generalizes from a curated graph representation obtained through an abstraction method designed to aid models to learn. In particular, GINN focuses exclusively on intervals (generally manifested in looping construct) for mining the feature representation of a program, furthermore, GINN operates on a hierarchy of intervals for scaling the learning to large graphs. We evaluate GINN for two popular downstream applications: variable misuse prediction and method name prediction. Results show in both cases GINN outperforms the state-of-the-art models by a comfortable margin. We have also created a neural bug detector based on GINN to catch null pointer deference bugs in Java code. While learning from the same 9,000 methods extracted from 64 projects, GINN-based bug detector significantly outperforms GNN-based bug detector on 13 unseen test projects. Next, we deploy our trained GINN-based bug detector and Facebook Infer, arguably the state-of-the-art static analysis tool, to scan the codebase of 20 highly starred projects on GitHub. Through our manual inspection, we confirm 38 bugs out of 102 warnings raised by GINN-based bug detector compared to 34 bugs out of 129 warnings for Facebook Infer. We have reported 38 bugs GINN caught to developers, among which 11 have been fixed and 12 have been confirmed (fix pending). GINN has shown to be a general, powerful deep neural network for learning precise, semantic program embeddings.
引用
收藏
页数:27
相关论文
共 50 条
  • [1] Learning semantic program embeddings with graph interval neural network
    Wang, Yu
    Wang, Ke
    Gao, Fengjuan
    Wang, Linzhang
    1600, Association for Computing Machinery (04):
  • [2] Deep Learning and Graph Embeddings for Network Biology
    Guzzi, Pietro Hiram
    Zitnik, Marinka
    IEEE-ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS, 2022, 19 (02) : 653 - 654
  • [3] Injecting Semantic Background Knowledge into Neural Networks using Graph Embeddings
    Ziegler, Konstantin
    Caelen, Olivier
    Garchery, Mathieu
    Granitzer, Michael
    He-Guelton, Liyun
    Jurgovsky, Johannes
    Portier, Pierre-Edouard
    Zwicklbauer, Stefan
    2017 IEEE 26TH INTERNATIONAL CONFERENCE ON ENABLING TECHNOLOGIES - INFRASTRUCTURE FOR COLLABORATIVE ENTERPRISES (WETICE), 2017, : 200 - 205
  • [5] Deep Learning of Knowledge Graph Embeddings for Semantic Parsing of Twitter Dialogs
    Heck, Larry
    Huang, Hongzhao
    2014 IEEE GLOBAL CONFERENCE ON SIGNAL AND INFORMATION PROCESSING (GLOBALSIP), 2014, : 597 - 601
  • [6] Reverse Graph Learning for Graph Neural Network
    Peng, Liang
    Hu, Rongyao
    Kong, Fei
    Gan, Jiangzhang
    Mo, Yujie
    Shi, Xiaoshuang
    Zhu, Xiaofeng
    IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2024, 35 (04) : 4530 - 4541
  • [7] Learning Kernel-Based Embeddings in Graph Neural Networks
    Navarin, Nicole
    Dinh Van Tran
    Sperduti, Alessandro
    ECAI 2020: 24TH EUROPEAN CONFERENCE ON ARTIFICIAL INTELLIGENCE, 2020, 325 : 1387 - 1394
  • [8] Alfa : active learning for graph neural network-based semantic schema alignment
    Meduri, Venkata Vamsikrishna
    Quamar, Abdul
    Lei, Chuan
    Qin, Xiao
    Reinwald, Berthold
    VLDB JOURNAL, 2024, 33 (04): : 981 - 1011
  • [9] Semantic-guided graph neural network for heterogeneous graph embedding
    Han, Mingjing
    Zhang, Han
    Li, Wei
    Yin, Yanbin
    EXPERT SYSTEMS WITH APPLICATIONS, 2023, 232
  • [10] Iterative Deep Graph Learning for Graph Neural Networks: Better and Robust Node Embeddings
    Chen, Yu
    Wu, Lingfei
    Zaki, Mohammed J.
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 33, NEURIPS 2020, 2020, 33