DEGraphCS: Embedding Variable-based Flow Graph for Neural Code Search

被引:15
|
作者
Zeng, Chen [1 ]
Yu, Yue [1 ]
Li, Shanshan [1 ]
Xia, Xin [2 ]
Wang, Zhiming [1 ]
Geng, Mingyang [1 ]
Bai, Linxiao [1 ]
Dong, Wei [1 ]
Liao, Xiangke [1 ]
机构
[1] Natl Univ Def Technol, Sch Comp, Changsha, Peoples R China
[2] Zhejiang Univ, Coll Comp Sci & Technol, Hangzhou, Peoples R China
基金
中国国家自然科学基金; 国家重点研发计划;
关键词
Intermediate representation; graph neural networks; code search; deep learning;
D O I
10.1145/3546066
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
With the rapid increase of public code repositories, developers maintain a great desire to retrieve precise code snippets by using natural language. Despite existing deep learning-based approaches that provide end-to-end solutions (i.e., accept natural language as queries and show related code fragments), the performance of code search in the large-scale repositories is still low in accuracy because of the code representation (e.g., AST) and modeling (e.g., directly fusing features in the attention stage). In this paper, we propose a novel learnable deep Graph for Code Search (called deGraphCS) to transfer source code into variable-based flow graphs based on an intermediate representation technique, which can model code semantics more precisely than directly processing the code as text or using the syntax tree representation. Furthermore, we propose a graph optimization mechanism to refine the code representation and apply an improved gated graph neural network to model variable-based flow graphs. To evaluate the effectiveness of deGraphCS, we collect a large-scale dataset from GitHub containing 41,152 code snippets written in the C language and reproduce several typical deep code search methods for comparison. The experimental results show that deGraphCS can achieve state-of-the-art performance and accurately retrieve code snippets satisfying the needs of the users.
引用
收藏
页数:27
相关论文
共 50 条
  • [1] Graph Embedding based Code Search in Software Project
    Zou, Yanzhen
    Ling, Chunyang
    Lin, Zeqi
    Xie, Bing
    INTERNETWARE'18: PROCEEDINGS OF THE TENTH ASIA-PACIFIC SYMPOSIUM ON INTERNETWARE, 2018,
  • [2] Neural Graph Embedding for Neural Architecture Search
    Li, Wei
    Gong, Shaogang
    Zhu, Xiatian
    THIRTY-FOURTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, THE THIRTY-SECOND INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE CONFERENCE AND THE TENTH AAAI SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2020, 34 : 4707 - 4714
  • [3] Embedding API dependency graph for neural code generation
    Lyu, Chen
    Wang, Ruyun
    Zhang, Hongyu
    Zhang, Hanwen
    Hu, Songlin
    arXiv, 2021,
  • [4] Embedding API dependency graph for neural code generation
    Lyu, Chen
    Wang, Ruyun
    Zhang, Hongyu
    Zhang, Hanwen
    Hu, Songlin
    EMPIRICAL SOFTWARE ENGINEERING, 2021, 26 (04)
  • [5] Embedding API dependency graph for neural code generation
    Chen Lyu
    Ruyun Wang
    Hongyu Zhang
    Hanwen Zhang
    Songlin Hu
    Empirical Software Engineering, 2021, 26
  • [6] Graph Embedding Based API Graph Search and Recommendation
    Chun-Yang Ling
    Yan-Zhen Zou
    Ze-Qi Lin
    Bing Xie
    Journal of Computer Science and Technology, 2019, 34 : 993 - 1006
  • [7] Graph Embedding Based API Graph Search and Recommendation
    Ling, Chun-Yang
    Zou, Yan-Zhen
    Lin, Ze-Qi
    Xie, Bing
    JOURNAL OF COMPUTER SCIENCE AND TECHNOLOGY, 2019, 34 (05) : 993 - 1006
  • [8] A Search-Based Testing Framework for Deep Neural Networks of Source Code Embedding
    Pour, Maryam Vahdat
    Li, Zhuo
    Ma, Lei
    Hemmati, Hadi
    2021 14TH IEEE CONFERENCE ON SOFTWARE TESTING, VERIFICATION AND VALIDATION (ICST 2021), 2021, : 36 - 46
  • [9] NASGEM: Neural Architecture Search via Graph Embedding Method
    Cheng, Hsin-Pai
    Zhang, Tunhou
    Zhang, Yixing
    Li, Shiyu
    Liang, Feng
    Yan, Feng
    Li, Meng
    Chandra, Vikas
    Li, Hai
    Chen, Yiran
    THIRTY-FIFTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, THIRTY-THIRD CONFERENCE ON INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE AND THE ELEVENTH SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2021, 35 : 7090 - 7098
  • [10] Binary Code Modularization Method Based on Graph Embedding
    Yuan, Shubin
    Liu, Chenyu
    Shi, Jianheng
    Han, Yu
    Pu, Wei
    Zhao, Siwei
    Yang, Liqun
    2024 IEEE 4TH INTERNATIONAL CONFERENCE ON SOFTWARE ENGINEERING AND ARTIFICIAL INTELLIGENCE, SEAI 2024, 2024, : 146 - 150