Fusing Code Searchers

被引:0
|
作者
Wang, Shangwen [1 ]
Geng, Mingyang [1 ]
Lin, Bo [1 ]
Sun, Zhensu [2 ]
Wen, Ming [3 ]
Liu, Yepang [4 ]
Li, Li [5 ]
Bissyande, Tegawende F. [6 ]
Mao, Xiaoguang [1 ]
机构
[1] Natl Univ Def Technol, Changsha 410073, Peoples R China
[2] ShanghaiTech Univ, Shanghai 201210, Peoples R China
[3] Huazhong Univ Sci & Technol, Wuhan 430074, Peoples R China
[4] Southern Univ Sci & Technol, Shenzhen 518055, Peoples R China
[5] Monash Univ, Clayton, Vic 3800, Australia
[6] Univ Luxembourg, L-1359 Luxembourg, Luxembourg
关键词
Codes; Information retrieval; data fusion;
D O I
10.1109/TSE.2024.3403042
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
Code search, which consists in retrieving relevant code snippets from a codebase based on a given query, provides developers with useful references during software development. Over the years, techniques alternatively adopting different mechanisms to compute the relevance score between a query and a code snippet have been proposed to advance the state of the art in this domain, including those relying on information retrieval, supervised learning, and pre-training. Despite that, the usefulness of existing techniques is still compromised since they cannot effectively handle all the diversified queries and code in practice. To tackle this challenge, we present Dancer, a data fusion based code searcher. Our intuition (also the basic hypothesis of this study) is that existing techniques may complement each other because of the intrinsic differences in their working mechanisms. We have validated this hypothesis via an exploratory study. Based on that, we propose to fuse the results generated by different code search techniques so that the advantage of each standalone technique can be fully leveraged. Specifically, we treat each technique as a retrieval system and leverage well-known data fusion approaches to aggregate the results from different systems. We evaluate six existing code search techniques on two large-scale datasets, and exploit eight classic data fusion approaches to incorporate their results. Our experiments show that the best fusion approach is able to outperform the standalone techniques by 35% - 550% and 65% - 825% in terms of MRR (mean reciprocal rank) on the two datasets, respectively.
引用
收藏
页码:1852 / 1866
页数:15
相关论文
共 50 条
  • [1] Fusing Code and Documents to Mine Software Functional Features
    融合代码与文档的软件功能特征挖掘方法
    Zou, Yan-Zhen (zouyz@pku.edu.cn), 1600, Chinese Academy of Sciences (32): : 1023 - 1038
  • [2] THE SEARCHERS
    Sinclair, Clive
    TLS-THE TIMES LITERARY SUPPLEMENT, 2013, (5763): : 7 - 8
  • [3] THE 'SEARCHERS'
    SCANNELL, V
    TLS-THE TIMES LITERARY SUPPLEMENT, 1993, (4714): : 27 - 27
  • [4] The searchers
    Jackson, Thomas
    FORBES, 2008, : 69 - 69
  • [5] The 'searchers'
    不详
    SIGHT AND SOUND, 2006, 16 (08): : 94 - 94
  • [6] THE SEARCHERS
    Perez, Domino Renee
    Viviani, Christian
    Towlson, Jon
    Errigo, Angie
    Schamus, James
    Ryan, Tom
    Kilb, Andreas
    Stupia, Eduardo
    Pavlovic, Milan
    Rubin, Martin
    SIGHT AND SOUND, 2022, 33 (01): : 68 - 68
  • [7] THE 'SEARCHERS'
    MAYO, EL
    NEW LETTERS, 1981, 47 (2-3): : 237 - 237
  • [8] THE SEARCHERS
    Fuller, Graham
    SIGHT AND SOUND, 2012, 22 (11): : 128 - 128
  • [9] The Searchers
    Murphy, Daniel P.
    JOURNAL OF AMERICAN CULTURE, 2023, 46 (03): : 284 - 285
  • [10] SEARCHERS
    FORBES, TR
    BULLETIN OF THE NEW YORK ACADEMY OF MEDICINE, 1974, 50 (09) : 1031 - 1038