Deep Graph Matching and Searching for Semantic Code Retrieval

被引:47
|
作者
Ling, Xiang [1 ]
Wu, Lingfei [2 ]
Wang, Saizhuo [1 ]
Pan, Gaoning [1 ]
Ma, Tengfei [2 ]
Xu, Fangli [3 ]
Liu, Alex X. [4 ]
Wu, Chunming [1 ,5 ]
Ji, Shouling [1 ]
机构
[1] Zhejiang Univ, 38 Zheda Rd, Hangzhou 310027, Zhejiang, Peoples R China
[2] IBM TJ Watson Res Ctr, 1101 Kitchawan Rd, Yorktown Hts, NY 10598 USA
[3] Squirrel AI Learning, 1601 Gabriel Ln, Highland Pk, NJ 08904 USA
[4] Ant Grp, 556 Xixi Rd, Hangzhou, Zhejiang, Peoples R China
[5] Zhejiang Lab, 1818 Wenyixi Rd, Hangzhou, Zhejiang, Peoples R China
基金
国家重点研发计划;
关键词
Neural networks; graph representation; source code retrieval;
D O I
10.1145/3447571
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Code retrieval is to find the code snippet from a large corpus of source code repositories that highly matches the query of natural language description. Recent work mainly uses natural language processing techniques to process both query texts (i.e., human natural language) and code snippets (i.e., machine programming language), however, neglecting the deep structured features of query texts and source codes, both of which contain rich semantic information. In this article, we propose an end-to-end deep graph matching and searching (DGMS) model based on graph neural networks for the task of semantic code retrieval. To this end, we first represent both natural language query texts and programming language code snippets with the unified graph-structured data, and then use the proposed graph matching and searching model to retrieve tile best matching code snippet. In particular, DGMS not only captures more structural information for individual query texts or code snippets, but also learns the fine-grained similarity between them by cross-attention based semantic matching operations. We evaluate the proposed DGMS model on two public code retrieval datasets with two representative programming languages (i.e., Java and Python). Experiment results demonstrate that DGMS significantly outperforms state-of-the-art baseline models by a large margin on both datasets. Moreover, our extensive ablation studies systematically investigate and illustrate the impact of each part of DGMS.
引用
收藏
页数:21
相关论文
共 50 条
  • [1] Enhancing Semantic Code Search With Deep Graph Matching
    Bibi, Nazia
    Maqbool, Ayesha
    Rana, Tauseef
    Afzal, Farkhanda
    Akgul, Ali
    Eldin, Sayed M.
    [J]. IEEE ACCESS, 2023, 11 : 52392 - 52411
  • [2] Deep Neural Matching Models for Graph Retrieval
    Goyal, Kunal
    Gupta, Utkarsh
    De, Abir
    Chakrabarti, Soumen
    [J]. PROCEEDINGS OF THE 43RD INTERNATIONAL ACM SIGIR CONFERENCE ON RESEARCH AND DEVELOPMENT IN INFORMATION RETRIEVAL (SIGIR '20), 2020, : 1701 - 1704
  • [3] Retrieval of Movie Character using Semantic Graph Matching Techniques
    Jajoo, Amita
    Kumaraguru, Shanthi
    [J]. PROCEEDINGS OF THE 2014 INTERNATIONAL CONFERENCE ON RELIABILTY, OPTIMIZATION, & INFORMATION TECHNOLOGY (ICROIT 2014), 2014, : 68 - 73
  • [4] QA system Metis based on Web searching and semantic graph matching
    Han, Dongli
    Kato, Yuhei
    Takehara, Kazuaki
    Yamamoto, Tetsuya
    Sugimura, Kazunori
    Harada, Minoru
    [J]. INTELLIGENT INFORMATION PROCESSING III, 2006, 228 : 123 - +
  • [5] CRaDLe: Deep code retrieval based on semantic Dependency Learning
    Gu, Wenchao
    Li, Zongjie
    Gao, Cuiyun
    Wang, Chaozheng
    Zhang, Hongyu
    Xu, Zenglin
    Lyu, Michael R.
    [J]. NEURAL NETWORKS, 2021, 141 : 385 - 394
  • [6] RETRIEVAL EFFECTIVENESS BY SEMANTIC AND CITATION SEARCHING
    PAO, ML
    WORTHEN, DB
    [J]. JOURNAL OF THE AMERICAN SOCIETY FOR INFORMATION SCIENCE, 1989, 40 (04): : 226 - 235
  • [7] Graph matching for shape retrieval
    Huet, B
    Cross, ADJ
    Hancock, ER
    [J]. ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 11, 1999, 11 : 896 - 902
  • [8] SIGMA: A Semantic Integrated Graph Matching Approach for identifying reused functions in binary code
    Alrabaee, Saed
    Shirani, Paria
    Wang, Lingyu
    Debbabi, Mourad
    [J]. DIGITAL INVESTIGATION, 2015, 12 : S61 - S71
  • [9] CSRS: Code Search with Relevance Matching and Semantic Matching
    Cheng, Yi
    Kuang, Li
    [J]. 30TH IEEE/ACM INTERNATIONAL CONFERENCE ON PROGRAM COMPREHENSION (ICPC 2022), 2022, : 533 - 542
  • [10] Conceptual graph matching for semantic search
    Zhong, JW
    Zhu, HP
    Li, JM
    Yu, Y
    [J]. CONCEPTUAL STRUCTURES: INTEGRATION AND INTERFACES, PROCEEDINGS, 2002, 2393 : 92 - 106