Retrieval on Source Code: A Neural Code Search

被引:95
|
作者
Sachdev, Saksham [1 ]
Li, Hongyu [2 ]
Luan, Sifei [2 ]
Kim, Seohyun [2 ]
Sen, Koushik [3 ]
Chandra, Satish [2 ]
机构
[1] Univ Waterloo, Waterloo, ON, Canada
[2] Facebook Inc, Cambridge, MA USA
[3] Univ Calif Berkeley, Berkeley, CA 94720 USA
关键词
code search; word-embedding; TF-IDF;
D O I
10.1145/3211346.3211353
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Searching over large code corpora can be a powerful productivity tool for both beginner and experienced developers because it helps them quickly find examples of code related to their intent. Code search becomes even more attractive if developers could express their intent in natural language, similar to the interaction that Stack Overflow supports. In this paper, we investigate the use of natural language processing and information retrieval techniques to carry out natural language search directly over source code, i.e. without having a curated Q&A forum such as Stack Overflow at hand. Our experiments using a benchmark suite derived from Stack Overflow and GitHub repositories show promising results. We find that while a basic word-embedding based search procedure works acceptably, better results can be obtained by adding a layer of supervision, as well as by a customized ranking strategy.
引用
下载
收藏
页码:31 / 41
页数:11
相关论文
共 50 条
  • [31] STraceBERT: Source Code Retrieval using Semantic Application Traces
    Spiess, Claudio
    PROCEEDINGS OF THE 31ST ACM JOINT MEETING EUROPEAN SOFTWARE ENGINEERING CONFERENCE AND SYMPOSIUM ON THE FOUNDATIONS OF SOFTWARE ENGINEERING, ESEC/FSE 2023, 2023, : 2207 - 2209
  • [32] Labeling source code with information retrieval methods: an empirical study
    Andrea De Lucia
    Massimiliano Di Penta
    Rocco Oliveto
    Annibale Panichella
    Sebastiano Panichella
    Empirical Software Engineering, 2014, 19 : 1383 - 1420
  • [33] Between sound and perception: reviewing the search for a neural code
    Eggermont, JJ
    HEARING RESEARCH, 2001, 157 (1-2) : 1 - 42
  • [35] Application of Information Retrieval Techniques for Source Code Authorship Attribution
    Burrows, Steven
    Uitdenbogerd, Alexandra L.
    Turpin, Andrew
    DATABASE SYSTEMS FOR ADVANCED APPLICATIONS, PROCEEDINGS, 2009, 5463 : 699 - 713
  • [36] Improving Source Code Lexicon via Traceability and Information Retrieval
    De Lucia, Andrea
    Di Penta, Massimiliano
    Oliveto, Rocco
    IEEE TRANSACTIONS ON SOFTWARE ENGINEERING, 2011, 37 (02) : 205 - 227
  • [37] Modeling Source Code to Support Retrieval-Based Applications
    Vinayakarao, Venkatesh
    WSDM'17: PROCEEDINGS OF THE TENTH ACM INTERNATIONAL CONFERENCE ON WEB SEARCH AND DATA MINING, 2017, : 833 - 833
  • [38] Source Code Retrieval for Bug Localization using Bug Report
    Swe, Kyaw Ei Ei
    Oo, Hnin Min
    2019 IEEE 15TH INTERNATIONAL CONFERENCE ON INTELLIGENT COMPUTER COMMUNICATION AND PROCESSING (ICCP 2019), 2019, : 241 - 247
  • [39] Labeling source code with information retrieval methods: an empirical study
    De Lucia, Andrea
    Di Penta, Massimiliano
    Oliveto, Rocco
    Panichella, Annibale
    Panichella, Sebastiano
    EMPIRICAL SOFTWARE ENGINEERING, 2014, 19 (05) : 1383 - 1420
  • [40] Searching program source code with a structured text retrieval system
    Clarke, C
    Cox, A
    Sim, S
    SIGIR'99: PROCEEDINGS OF 22ND INTERNATIONAL CONFERENCE ON RESEARCH AND DEVELOPMENT IN INFORMATION RETRIEVAL, 1999, : 307 - 308