A Large-Scale Study On Repetitiveness, Containment, and Composability of Routines in Open-Source Projects

被引:0
|
作者
Anh Tuan Nguyen [1 ]
Hoan Anh Nguyen [1 ]
Nguyen, Tien N. [1 ]
机构
[1] Iowa State Univ, ECpE Dept, Ames, IA 50011 USA
关键词
Repetitiveness; Containment; Composability; Code Reuse; SOFTWARE;
D O I
10.1145/2901739.2901759
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Source code in software systems has been shown to have a good degree of repetitiveness at the lexical, syntactical, and API usage levels. This paper presents a large-scale study on the repetitiveness, containment, and composability of source code at the semantic level. We collected a large dataset consisting of 9,224 Java projects with 2.79M class files, 17.54M methods with 187M SLOCs. For each method in a project, we build the program dependency graph (PDG) to represent a routine, and compare PDGs with one another as well as the subgraphs within them. We found that within a project, 12.1% of the routines are repeated, and most of them repeat from 2-7 times. As entirety, the routines are quite project-specific with only 3.3% of them exactly repeating in 1-4 other projects with at most 8 times. We also found that 26.1% and 7.27% of the routines are contained in other routine(s), i.e., implemented as part of other routine( s) elsewhere within a project and in other projects, respectively. Except for trivial routines, their repetitiveness and containment is independent of their complexity. Defining a subroutine via a per-variable slicing subgraph in a PDG, we found that 14.3% of all routines have all of their subroutines repeated. A high percentage of subroutines in a routine can be found/reused elsewhere. We collected 8,764,971 unique subroutines (with 323,564 unique JDK subroutines) as basic units for code searching/synthesis. We also provide practical implications of our findings to automated tools.
引用
收藏
页码:362 / 373
页数:12
相关论文
共 50 条
  • [21] A Large-Scale Empirical Study of Real-Life Performance Issues in Open Source Projects
    Zhao, Yutong
    Xiao, Lu
    Bondi, Andre B.
    Chen, Bihuan
    Liu, Yang
    IEEE TRANSACTIONS ON SOFTWARE ENGINEERING, 2023, 49 (02) : 924 - 946
  • [22] An open-source framework for large-scale transient topology optimization using PETSc
    Kristiansen, Hansotto
    Aage, Niels
    STRUCTURAL AND MULTIDISCIPLINARY OPTIMIZATION, 2022, 65 (10)
  • [23] Neural ensemble communities: open-source approaches to hardware for large-scale electrophysiology
    Siegle, Joshua H.
    Hale, Gregory J.
    Newman, Jonathan P.
    Voigts, Jakob
    CURRENT OPINION IN NEUROBIOLOGY, 2015, 32 : 53 - 59
  • [24] MeshMonk: Open-source large-scale intensive 3D phenotyping
    Julie D. White
    Alejandra Ortega-Castrillón
    Harold Matthews
    Arslan A. Zaidi
    Omid Ekrami
    Jonatan Snyders
    Yi Fan
    Tony Penington
    Stefan Van Dongen
    Mark D. Shriver
    Peter Claes
    Scientific Reports, 9
  • [25] Research and Application on Open-source Database in Large-scale Nuclear Power Enterprises
    Guo, Wei
    Wang, Qiang
    2015 International Conference on Software Engineering and Information System (SEIS 2015), 2015, : 544 - 550
  • [26] MigrationAdvisor: Recommending Library Migrations from Large-Scale Open-Source Data
    He, Hao
    Xu, Yulin
    Cheng, Xiao
    Liang, Guangtai
    Zhou, Minghui
    2021 IEEE/ACM 43RD INTERNATIONAL CONFERENCE ON SOFTWARE ENGINEERING: COMPANION PROCEEDINGS (ICSE-COMPANION 2021), 2021, : 9 - 12
  • [27] MigrationAdvisor: Recommending Library Migrations from Large-Scale Open-Source Data
    He, Hao
    Xu, Yulin
    Cheng, Xiao
    Liang, Guangtai
    Zhou, Minghui
    Proceedings - International Conference on Software Engineering, 2021, : 9 - 12
  • [28] VERI: A Large-scale Open-Source Components Vulnerability Detection in IoT Firmware
    Cheng, Yiran
    Yang, Shouguo
    Lang, Zhe
    Shi, Zhiqiang
    Sun, Limin
    COMPUTERS & SECURITY, 2023, 126
  • [29] TDNetGen: An Open-Source, Parametrizable, Large-Scale, Transmission, and Distribution Test System
    Pilatte, Nicolas
    Aristidou, Petros
    Hug, Gabriela
    IEEE SYSTEMS JOURNAL, 2019, 13 (01): : 729 - 737
  • [30] MillimeTera: Toward A Large-Scale Open-Source mmWave and Terahertz Experimental Testbed
    Polese, Michele
    Restuccia, Francesco
    Gosain, Abhimanyu
    Jornet, Josep
    Bhardwaj, Shubhendu
    Ariyarathna, Viduneth
    Mandal, Soumyajit
    Zheng, Kai
    Dhananjay, Aditya
    Mezzavilla, Marco
    Buckwalter, James
    Rodwell, Mark
    Wang, Xin
    Zorzi, Michele
    Madanayake, Arjuna
    Melodia, Tommaso
    PROCEEDINGS OF THE 3RD ACM WORKSHOP ON MILLIMETER-WAVE NETWORKS AND SENSING SYSTEMS, MMNETS 2019, 2019, : 27 - 32