Sourcerer: An infrastructure for large-scale collection and analysis of open-source code

被引:55
|
作者
Bajracharya, Sushi [1 ]
Ossher, Joel [1 ]
Lopes, Cristina [1 ]
机构
[1] Univ Calif Irvine, Irvine, CA 92697 USA
关键词
Open source; Internet-scale code retrieval; Data mining; Sourcerer; Static analysis; Software information retrieval; SOFTWARE; SEARCH; REUSE;
D O I
10.1016/j.scico.2012.04.008
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
A large amount of open source code is now available online, presenting a great potential resource for software developers. This has motivated software engineering researchers to develop tools and techniques to allow developers to reap the benefits of these billions of lines of source code. However, collecting and analyzing such a large quantity of source code presents a number of challenges. Although the current generation of open source code search engines provides access to the source code in an aggregated repository, they generally fail to take advantage of the rich structural information contained in the code they index. This makes them significantly less useful than Sourcerer for building state-ofthe-art software engineering tools, as these tools often require access to both the structural and textual information available in source code. We have developed Sourcerer, an infrastructure for large-scale collection and analysis of open source code. By taking full advantage of the structural information extracted from source code in its repository, Sourcerer provides a foundation upon which state-ofthe-art search engines and related tools can easily be built. We describe the Sourcerer infrastructure, present the applications that we have built on top of it, and discuss how existing tools could benefit from using Sourcerer. (C) 2012 Elsevier B.V. All rights reserved.
引用
收藏
页码:241 / 259
页数:19
相关论文
共 50 条
  • [31] Large-Scale Geospatial Analysis of Suitable Siting for Green Stormwater Infrastructure: An Open-Source Tool for Promoting Sustainability and Environmental Justice in Urban Communities
    Hoque, S. M. Mushfiqul
    Kamanmalek, Sara
    Alamdari, Nasrin
    JOURNAL OF ENVIRONMENTAL ENGINEERING, 2024, 150 (12)
  • [32] BeeGround - An Open-Source Simulation Platform for Large-Scale Swarm Robotics Applications
    Lim, Sean
    Wang, Shiyi
    Lennox, Barry
    Arvin, Farshad
    2021 7TH INTERNATIONAL CONFERENCE ON AUTOMATION, ROBOTICS AND APPLICATIONS (ICARA 2021), 2021, : 75 - 79
  • [33] An open-source framework for large-scale transient topology optimization using PETSc
    Hansotto Kristiansen
    Niels Aage
    Structural and Multidisciplinary Optimization, 2022, 65
  • [34] Game-theory strategies for open-source Infrastructure-as-Code
    de la Fuente Ruiz, Alfonso E.
    Nedeltcheva, Galia Novakova
    2023 IEEE 20TH INTERNATIONAL CONFERENCE ON SOFTWARE ARCHITECTURE COMPANION, ICSA-C, 2023, : 328 - 332
  • [35] Large-Scale Data Analysis for Glucose Variability Outcomes with Open-Source Automated Insulin Delivery Systems
    Shahid, Arsalan
    Lewis, Dana M.
    NUTRIENTS, 2022, 14 (09)
  • [36] Detecting code vulnerabilities by learning from large-scale open source repositories
    Xu, Rongze
    Tang, Zhanyong
    Ye, Guixin
    Wang, Huanting
    Ke, Xin
    Fang, Dingyi
    Wang, Zheng
    JOURNAL OF INFORMATION SECURITY AND APPLICATIONS, 2022, 69
  • [37] Code Coverage and Postrelease Defects: A Large-Scale Study on Open Source Projects
    Kochhar, Pavneet Singh
    Lo, David
    Lawall, Julia
    Nagappan, Nachiappan
    IEEE TRANSACTIONS ON RELIABILITY, 2017, 66 (04) : 1213 - 1228
  • [38] Understanding Source Code Comments at Large-Scale
    He, Hao
    ESEC/FSE'2019: PROCEEDINGS OF THE 2019 27TH ACM JOINT MEETING ON EUROPEAN SOFTWARE ENGINEERING CONFERENCE AND SYMPOSIUM ON THE FOUNDATIONS OF SOFTWARE ENGINEERING, 2019, : 1217 - 1219
  • [39] TypeScript: An Open-Source Programming Language with Options for Robust Development and Large-Scale Applications
    Acropolis Institute of Technology and Research, Dept. of Computer Science and Information Technology, Indore, India
    Int. Conf. Adv. Comput. Res. Sci. Eng. Technol., ACROSET, 2024,
  • [40] QuoVidi: An open-source web application for the organization of large-scale biological treasure hunts
    Lobet, Guillaume
    Descamps, Charlotte
    Leveau, Lola
    Guillet, Alain
    Rees, Jean-Francois
    ECOLOGY AND EVOLUTION, 2021, 11 (08): : 3516 - 3526