Retrieval on Source Code: A Neural Code Search

被引:95
|
作者
Sachdev, Saksham [1 ]
Li, Hongyu [2 ]
Luan, Sifei [2 ]
Kim, Seohyun [2 ]
Sen, Koushik [3 ]
Chandra, Satish [2 ]
机构
[1] Univ Waterloo, Waterloo, ON, Canada
[2] Facebook Inc, Cambridge, MA USA
[3] Univ Calif Berkeley, Berkeley, CA 94720 USA
关键词
code search; word-embedding; TF-IDF;
D O I
10.1145/3211346.3211353
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Searching over large code corpora can be a powerful productivity tool for both beginner and experienced developers because it helps them quickly find examples of code related to their intent. Code search becomes even more attractive if developers could express their intent in natural language, similar to the interaction that Stack Overflow supports. In this paper, we investigate the use of natural language processing and information retrieval techniques to carry out natural language search directly over source code, i.e. without having a curated Q&A forum such as Stack Overflow at hand. Our experiments using a benchmark suite derived from Stack Overflow and GitHub repositories show promising results. We find that while a basic word-embedding based search procedure works acceptably, better results can be obtained by adding a layer of supervision, as well as by a customized ranking strategy.
引用
收藏
页码:31 / 41
页数:11
相关论文
共 50 条
  • [41] A NEW SOURCE CODE REPOSITORY FOR DYNAMIC STORING, BROWSING, AND RETRIEVAL OF SOURCE CODES
    Chakraborty, Prithwi Raj
    Chowdhury, Sujan
    Chowdhury, Alok Kumar
    Al Hasan, Shahed
    2013 INTERNATIONAL CONFERENCE ON INFORMATICS, ELECTRONICS & VISION (ICIEV), 2013,
  • [42] CoaCor: Code Annotation for Code Retrieval with Reinforcement Learning
    Yao, Ziyu
    Peddamail, Jayavardhan Reddy
    Sun, Huan
    WEB CONFERENCE 2019: PROCEEDINGS OF THE WORLD WIDE WEB CONFERENCE (WWW 2019), 2019, : 2203 - 2214
  • [43] SOURCE CODE
    Lopez, Julyssa
    NATION, 2020, 311 (05) : 36 - 37
  • [44] Code semantic enrichment for deep code search
    Deng, Zhongyang
    Xu, Ling
    Liu, Chao
    Huangfu, Luwen
    Yan, Meng
    JOURNAL OF SYSTEMS AND SOFTWARE, 2024, 207
  • [45] SOURCE CODE
    WEISER, M
    COMPUTER, 1987, 20 (11) : 66 - 73
  • [46] Boosting Code Search with Structural Code Annotation
    Kong, Xianglong
    Chen, Hongyu
    Yu, Ming
    Zhang, Lixiang
    ELECTRONICS, 2022, 11 (19)
  • [47] Source code
    Lizza, R
    NEW REPUBLIC, 2005, 233 (07) : 11 - 13
  • [48] On the Embeddings of Variables in Recurrent Neural Networks for Source Code
    Chirkova, Nadezhda
    2021 CONFERENCE OF THE NORTH AMERICAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS: HUMAN LANGUAGE TECHNOLOGIES (NAACL-HLT 2021), 2021, : 2679 - 2689
  • [49] Graph Neural Network for Source Code Defect Prediction
    Sikic, Lucija
    Kurdija, Adrian Satja
    Vladimir, Klemo
    Silic, Marin
    IEEE Access, 2022, 10 : 10402 - 10415
  • [50] SOURCE CODE
    Warden, Barry
    SIGHT AND SOUND, 2023, 33 (03): : 22 - 22