A Large-Scale Study On Repetitiveness, Containment, and Composability of Routines in Open-Source Projects

被引:0
|
作者
Anh Tuan Nguyen [1 ]
Hoan Anh Nguyen [1 ]
Nguyen, Tien N. [1 ]
机构
[1] Iowa State Univ, ECpE Dept, Ames, IA 50011 USA
关键词
Repetitiveness; Containment; Composability; Code Reuse; SOFTWARE;
D O I
10.1145/2901739.2901759
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Source code in software systems has been shown to have a good degree of repetitiveness at the lexical, syntactical, and API usage levels. This paper presents a large-scale study on the repetitiveness, containment, and composability of source code at the semantic level. We collected a large dataset consisting of 9,224 Java projects with 2.79M class files, 17.54M methods with 187M SLOCs. For each method in a project, we build the program dependency graph (PDG) to represent a routine, and compare PDGs with one another as well as the subgraphs within them. We found that within a project, 12.1% of the routines are repeated, and most of them repeat from 2-7 times. As entirety, the routines are quite project-specific with only 3.3% of them exactly repeating in 1-4 other projects with at most 8 times. We also found that 26.1% and 7.27% of the routines are contained in other routine(s), i.e., implemented as part of other routine( s) elsewhere within a project and in other projects, respectively. Except for trivial routines, their repetitiveness and containment is independent of their complexity. Defining a subroutine via a per-variable slicing subgraph in a PDG, we found that 14.3% of all routines have all of their subroutines repeated. A high percentage of subroutines in a routine can be found/reused elsewhere. We collected 8,764,971 unique subroutines (with 323,564 unique JDK subroutines) as basic units for code searching/synthesis. We also provide practical implications of our findings to automated tools.
引用
收藏
页码:362 / 373
页数:12
相关论文
共 50 条
  • [31] ScenarioNet: Open-Source Platform for Large-Scale Traffic Scenario Simulation and Modeling
    Li, Quanyi
    Peng, Zhenghao
    Feng, Lan
    Liu, Zhizheng
    Duan, Chenda
    Mo, Wenjie
    Zhou, Bolei
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 36 (NEURIPS 2023), 2023,
  • [32] MeshMonk: Open-source large-scale intensive 3D phenotyping
    White, Julie D.
    Ortega-Castrillon, Alejandra
    Matthews, Harold
    Zaidi, Arslan A.
    Ekrami, Omid
    Snyders, Jonatan
    Fan, Yi
    Penington, Tony
    Van Dongen, Stefan
    Shriver, Mark D.
    Claes, Peter
    SCIENTIFIC REPORTS, 2019, 9 (1)
  • [33] BeeGround - An Open-Source Simulation Platform for Large-Scale Swarm Robotics Applications
    Lim, Sean
    Wang, Shiyi
    Lennox, Barry
    Arvin, Farshad
    2021 7TH INTERNATIONAL CONFERENCE ON AUTOMATION, ROBOTICS AND APPLICATIONS (ICARA 2021), 2021, : 75 - 79
  • [34] An open-source framework for large-scale transient topology optimization using PETSc
    Hansotto Kristiansen
    Niels Aage
    Structural and Multidisciplinary Optimization, 2022, 65
  • [35] A large-scale empirical exploration on refactoring activities in open source software projects
    Vassallo, Carmine
    Grano, Giovanni
    Palomba, Fabio
    Gall, Harald C.
    Bacchelli, Alberto
    SCIENCE OF COMPUTER PROGRAMMING, 2019, 180 : 1 - 15
  • [36] A bug finder refined by a large set of open-source projects
    Nam, Jaechang
    Wang, Song
    Xi, Yuan
    Tan, Lin
    INFORMATION AND SOFTWARE TECHNOLOGY, 2019, 112 : 164 - 175
  • [37] Exploring the Characteristics of Identifiers: A Large-Scale Empirical Study on 5,000 Open Source Projects
    Zhang, Jingxuan
    Liu, Siyuan
    Luo, Junpeng
    Liang, Jiahui
    Huang, Zhiqiu
    IEEE ACCESS, 2020, 8 : 140607 - 140620
  • [38] TypeScript: An Open-Source Programming Language with Options for Robust Development and Large-Scale Applications
    Acropolis Institute of Technology and Research, Dept. of Computer Science and Information Technology, Indore, India
    Int. Conf. Adv. Comput. Res. Sci. Eng. Technol., ACROSET, 2024,
  • [39] QuoVidi: An open-source web application for the organization of large-scale biological treasure hunts
    Lobet, Guillaume
    Descamps, Charlotte
    Leveau, Lola
    Guillet, Alain
    Rees, Jean-Francois
    ECOLOGY AND EVOLUTION, 2021, 11 (08): : 3516 - 3526
  • [40] GATECloud.net: a platform for large-scale, open-source text processing on the cloud
    Tablan, Valentin
    Roberts, Ian
    Cunningham, Hamish
    Bontcheva, Kalina
    PHILOSOPHICAL TRANSACTIONS OF THE ROYAL SOCIETY A-MATHEMATICAL PHYSICAL AND ENGINEERING SCIENCES, 2013, 371 (1983):