Retrieval on Source Code: A Neural Code Search

被引:95
|
作者
Sachdev, Saksham [1 ]
Li, Hongyu [2 ]
Luan, Sifei [2 ]
Kim, Seohyun [2 ]
Sen, Koushik [3 ]
Chandra, Satish [2 ]
机构
[1] Univ Waterloo, Waterloo, ON, Canada
[2] Facebook Inc, Cambridge, MA USA
[3] Univ Calif Berkeley, Berkeley, CA 94720 USA
关键词
code search; word-embedding; TF-IDF;
D O I
10.1145/3211346.3211353
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Searching over large code corpora can be a powerful productivity tool for both beginner and experienced developers because it helps them quickly find examples of code related to their intent. Code search becomes even more attractive if developers could express their intent in natural language, similar to the interaction that Stack Overflow supports. In this paper, we investigate the use of natural language processing and information retrieval techniques to carry out natural language search directly over source code, i.e. without having a curated Q&A forum such as Stack Overflow at hand. Our experiments using a benchmark suite derived from Stack Overflow and GitHub repositories show promising results. We find that while a basic word-embedding based search procedure works acceptably, better results can be obtained by adding a layer of supervision, as well as by a customized ranking strategy.
引用
下载
收藏
页码:31 / 41
页数:11
相关论文
共 50 条
  • [21] A Search-Based Testing Framework for Deep Neural Networks of Source Code Embedding
    Pour, Maryam Vahdat
    Li, Zhuo
    Ma, Lei
    Hemmati, Hadi
    2021 14TH IEEE CONFERENCE ON SOFTWARE TESTING, VERIFICATION AND VALIDATION (ICST 2021), 2021, : 36 - 46
  • [22] Retrieving Self-executable and Functionally Correct Code to Improve Source Code Search
    Satter, Abdus
    Muntaqeem, M. G.
    Nahar, Nadia
    Sakib, Kazi
    2017 24TH ASIA-PACIFIC SOFTWARE ENGINEERING CONFERENCE (APSEC 2017), 2017, : 749 - 750
  • [23] CoNCRA: A Convolutional Neural Networks Code Retrieval Approach
    Martins, Marcelo de Rezende
    Gerosa, Marco Aurelio
    34TH BRAZILIAN SYMPOSIUM ON SOFTWARE ENGINEERING, SBES 2020, 2020, : 526 - 531
  • [24] Source Code Classification Using Neural Networks
    Gilda, Shlok
    PROCEEDINGS OF 2017 14TH INTERNATIONAL JOINT CONFERENCE ON COMPUTER SCIENCE AND SOFTWARE ENGINEERING (JCSSE), 2017,
  • [25] Towards Summarizing Program Statements in Source Code Search
    Marin, Victor J.
    Bansal, Iti
    Rivero, Carlos R.
    PROCEEDINGS OF THE 35TH ANNUAL ACM SYMPOSIUM ON APPLIED COMPUTING (SAC'20), 2020, : 118 - 120
  • [26] Applying a Semantic Layer in a Source Code Search Tool
    Durao, Frederico A.
    Vanderlei, Taciana A.
    Almeida, Eduardo S.
    Meira, Silvio R. de L.
    APPLIED COMPUTING 2008, VOLS 1-3, 2008, : 1151 - 1157
  • [27] A FRAMEWORK FOR SOURCE CODE SEARCH USING PROGRAM PATTERNS
    PAUL, S
    PRAKASH, A
    IEEE TRANSACTIONS ON SOFTWARE ENGINEERING, 1994, 20 (06) : 463 - 475
  • [28] Combining Holistic Source Code Representation with Siamese Neural Networks for Detecting Code Clones
    Patel, Smit
    Sinha, Roopak
    TESTING SOFTWARE AND SYSTEMS, ICTSS 2021, 2022, 13045 : 148 - 159
  • [29] Search in Source Code Based on Identifying Popular Fragments
    Kuric, Eduard
    Bielikova, Maria
    SOFSEM 2013: THEORY AND PRACTICE OF COMPUTER SCIENCE, 2013, 7741 : 408 - 419
  • [30] A FRAMEWORK FOR SOURCE CODE SEARCH USING PROGRAM PATTERNS
    DEVANBU, P
    IEEE TRANSACTIONS ON SOFTWARE ENGINEERING, 1995, 21 (12) : 1009 - 1010