Learning Semantic Program Embeddings with Graph Interval Neural Network

被引:37
|
作者
Wang, Yu [1 ]
Wang, Ke [2 ]
Gao, Fengjuan [1 ]
Wang, Linzhang [1 ]
机构
[1] Nanjing Univ, Dept Comp Sci & Technol, State Key Lab Novel Software Technol, Nanjing 210023, Jiangsu, Peoples R China
[2] Visa Res, Secur Cryptog & Blockchain, Palo Alto, CA USA
基金
中国国家自然科学基金;
关键词
Program embeddings; Control-flow graphs; Intervals; Graph neural networks; Null pointer dereference detection;
D O I
10.1145/3428205
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
Learning distributed representations of source code has been a challenging task for machine learning models. Earlier works treated programs as text so that natural language methods can be readily applied. Unfortunately, such approaches do not capitalize on the rich structural information possessed by source code. Of late, Graph Neural Network (GNN) was proposed to learn embeddings of programs from their graph representations. Due to the homogeneous (i.e. do not take advantage of the program-specific graph characteristics) and expensive (i.e. require heavy information exchange among nodes in the graph) message-passing procedure, GNN can suffer from precision issues, especially when dealing with programs rendered into large graphs. In this paper, we present a new graph neural architecture, called Graph Interval Neural Network (GINN), to tackle the weaknesses of the existing GNN. Unlike the standard GNN, GINN generalizes from a curated graph representation obtained through an abstraction method designed to aid models to learn. In particular, GINN focuses exclusively on intervals (generally manifested in looping construct) for mining the feature representation of a program, furthermore, GINN operates on a hierarchy of intervals for scaling the learning to large graphs. We evaluate GINN for two popular downstream applications: variable misuse prediction and method name prediction. Results show in both cases GINN outperforms the state-of-the-art models by a comfortable margin. We have also created a neural bug detector based on GINN to catch null pointer deference bugs in Java code. While learning from the same 9,000 methods extracted from 64 projects, GINN-based bug detector significantly outperforms GNN-based bug detector on 13 unseen test projects. Next, we deploy our trained GINN-based bug detector and Facebook Infer, arguably the state-of-the-art static analysis tool, to scan the codebase of 20 highly starred projects on GitHub. Through our manual inspection, we confirm 38 bugs out of 102 warnings raised by GINN-based bug detector compared to 34 bugs out of 129 warnings for Facebook Infer. We have reported 38 bugs GINN caught to developers, among which 11 have been fixed and 12 have been confirmed (fix pending). GINN has shown to be a general, powerful deep neural network for learning precise, semantic program embeddings.
引用
收藏
页数:27
相关论文
共 50 条
  • [21] Graph Alignment Neural Network Model With Graph to Sequence Learning
    Ning, Nianwen
    Wu, Bin
    Ren, Haoqing
    Li, Qiuyue
    IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2024, 36 (09) : 4693 - 4706
  • [22] LD2: Scalable Heterophilous Graph Neural Network with Decoupled Embeddings
    Liao, Ningyi
    Luo, Siqiang
    Li, Xiang
    Shi, Jieming
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 36 (NEURIPS 2023), 2023,
  • [23] Geodesic Graph Neural Network for Efficient Graph Representation Learning
    Kong, Lecheng
    Chen, Yixin
    Zhang, Muhan
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 35 (NEURIPS 2022), 2022,
  • [24] ParamE: Regarding Neural Network Parameters as Relation Embeddings for Knowledge Graph Completion
    Che, Feihu
    Zhang, Dawei
    Tao, Jianhua
    Niu, Mingyue
    Zhao, Bocheng
    THIRTY-FOURTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, THE THIRTY-SECOND INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE CONFERENCE AND THE TENTH AAAI SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2020, 34 : 2774 - 2781
  • [25] Learning solid dynamics with graph neural network
    Li, Bohao
    Du, Bowen
    Ye, Junchen
    Huang, Jiajing
    Sun, Leilei
    Feng, Jinyan
    INFORMATION SCIENCES, 2024, 676
  • [26] Semantic Graph Neural Network: A Conversion from Spam Email Classification to Graph Classification
    Pan, Weisen
    Li, Jian
    Gao, Lisa
    Yue, Liexiang
    Yang, Yan
    Deng, Lingli
    Deng, Chao
    SCIENTIFIC PROGRAMMING, 2022, 2022
  • [27] Semantic- and relation-based graph neural network for knowledge graph completion
    Li, Xinlu
    Tian, Yujie
    Ji, Shengwei
    APPLIED INTELLIGENCE, 2024, 54 (08) : 6085 - 6107
  • [28] A Heterogeneous Directed Graph Attention Network for inductive text classification using multilevel semantic embeddings
    Lin, Mu
    Wang, Tao
    Zhu, Yifan
    Li, Xiaobo
    Zhou, Xin
    Wang, Weiping
    KNOWLEDGE-BASED SYSTEMS, 2024, 295
  • [29] Partial Label Learning with competitive learning graph neural network
    Fan, Jinfu
    Yu, Yang
    Wang, Zhongjie
    ENGINEERING APPLICATIONS OF ARTIFICIAL INTELLIGENCE, 2022, 111
  • [30] Semantic Role Labeling for Amharic Text Using Multiple Embeddings and Deep Neural Network
    Hailu, Bemnet Meresa
    Assabie, Yaregal
    Sinshaw, Yenewondim Biadgie
    IEEE ACCESS, 2023, 11 : 33274 - 33295