Retrieval on Source Code: A Neural Code Search

被引:95
|
作者
Sachdev, Saksham [1 ]
Li, Hongyu [2 ]
Luan, Sifei [2 ]
Kim, Seohyun [2 ]
Sen, Koushik [3 ]
Chandra, Satish [2 ]
机构
[1] Univ Waterloo, Waterloo, ON, Canada
[2] Facebook Inc, Cambridge, MA USA
[3] Univ Calif Berkeley, Berkeley, CA 94720 USA
关键词
code search; word-embedding; TF-IDF;
D O I
10.1145/3211346.3211353
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Searching over large code corpora can be a powerful productivity tool for both beginner and experienced developers because it helps them quickly find examples of code related to their intent. Code search becomes even more attractive if developers could express their intent in natural language, similar to the interaction that Stack Overflow supports. In this paper, we investigate the use of natural language processing and information retrieval techniques to carry out natural language search directly over source code, i.e. without having a curated Q&A forum such as Stack Overflow at hand. Our experiments using a benchmark suite derived from Stack Overflow and GitHub repositories show promising results. We find that while a basic word-embedding based search procedure works acceptably, better results can be obtained by adding a layer of supervision, as well as by a customized ranking strategy.
引用
收藏
页码:31 / 41
页数:11
相关论文
共 50 条
  • [1] A Neural Framework for Retrieval and Summarization of Source Code
    Chen, Qingying
    Zhou, Minghui
    [J]. PROCEEDINGS OF THE 2018 33RD IEEE/ACM INTERNATIONAL CONFERENCE ON AUTOMTED SOFTWARE ENGINEERING (ASE' 18), 2018, : 826 - 831
  • [2] Retrieval-based Neural Source Code Summarization
    Zhang, Jian
    Wang, Xu
    Zhang, Hongyu
    Sun, Hailong
    Liu, Xudong
    [J]. 2020 ACM/IEEE 42ND INTERNATIONAL CONFERENCE ON SOFTWARE ENGINEERING (ICSE 2020), 2020, : 1385 - 1397
  • [3] ANNE: Improving Source Code Search using Entity Retrieval Approach
    Vinayakarao, Venkatesh
    Sarma, Anita
    Purandare, Rahul
    Jain, Shuktika
    Jain, Saumya
    [J]. WSDM'17: PROCEEDINGS OF THE TENTH ACM INTERNATIONAL CONFERENCE ON WEB SEARCH AND DATA MINING, 2017, : 211 - 220
  • [4] Search for Compatible Source Code
    Cai, Fuqi
    Wang, Changjing
    Huang, Qing
    Zuo, Zhengkang
    Liao, Yunyan
    [J]. INTERNATIONAL JOURNAL OF SOFTWARE ENGINEERING AND KNOWLEDGE ENGINEERING, 2021, 31 (03) : 477 - 502
  • [5] Solving the Search for Source Code
    Stolee, Kathryn T.
    Elbaum, Sebastian
    Dobos, Daniel
    [J]. ACM TRANSACTIONS ON SOFTWARE ENGINEERING AND METHODOLOGY, 2014, 23 (03)
  • [6] Using Fuzzy Code Search to Link Code Fragments in Discussions to Source Code
    Bettenburg, Nicolas
    Thomas, Stephen W.
    Hassan, Ahmed E.
    [J]. 2012 16TH EUROPEAN CONFERENCE ON SOFTWARE MAINTENANCE AND REENGINEERING (CSMR), 2012, : 319 - 328
  • [7] Backdooring Neural Code Search
    Sun, Weisong
    Chen, Yuchen
    Tao, Guanhong
    Fang, Chunrong
    Zhang, Xiangyu
    Zhang, Quanjun
    Luo, Bin
    [J]. PROCEEDINGS OF THE 61ST ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2023): LONG PAPERS, VOL 1, 2023, : 9692 - 9708
  • [8] Code-to-Code Search Based on Deep Neural Network and Code Mutation
    Fujiwara, Yuji
    Yoshida, Norihiro
    Choi, Eunjong
    Inoue, Katsuro
    [J]. 2019 IEEE 13TH INTERNATIONAL WORKSHOP ON SOFTWARE CLONES (IWSC '19), 2019, : 1 - 7
  • [9] Navigating the Neural Space in Search of the Neural Code
    Jazayeri, Mehrdad
    Afraz, Arash
    [J]. NEURON, 2017, 93 (05) : 1003 - 1014
  • [10] Leveraging source code search for reuse
    Happel, Hans-Joerg
    Schuster, Thomas
    Szulman, Peter
    [J]. HIGH CONFIDENCE SOFTWARE REUSE IN LARGE SYSTEMS, PROCEEDINGS, 2008, 5030 : 360 - 371