Characterizing Commits in Open-Source Software

被引:0
|
作者
Ferreira, Mivian M. [1 ]
Goncalves, Diego Santos [2 ]
Bigonha, Mariza A. S. [1 ]
Ferreira, Kecia A. M. [2 ]
机构
[1] Univ Fed Minas Gerais, Belo Horizonte, MG, Brazil
[2] Fed Ctr Technol Educ Minas Gerais, Belo Horizonte, MG, Brazil
关键词
empirical study; commit; open-source; mining software repositories; !text type='Java']Java[!/text;
D O I
10.1145/3571473.3571508
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
Mining software repositories has been the basis of many studies on software engineering. Many of these works rely on commits' data extracted since commit is the basic unit of information about activities performed on the projects. However, not knowing the characteristics of commits may introduce biases and threats in studies that consider commits' data. This work presents an empirical study to characterize commits in terms of four aspects: the size of commits in the total number of files; the size of commits in the number of source-code files, the size of commits by category; and the time interval of commits performed by contributors. We analyzed 1M commits from the 24 most popular and active Java-based projects hosted on GitHub. The main findings of this work show that: the size of commits follows a heavy-tailed distribution; most commits involve one to 10 files; most commits affect one to four source-code files; the commits involving hundreds of files not only refer to merge or management activities; the distribution of the time intervals is approximately a Normal distribution, i.e., the distribution tends to be symmetric, and the mean is representative; in the average, a developer proceed a commit every eight hours. The results of this study should be considered by researchers in empirical works to avoid biases when analyzing commits' data. Besides, the results provide information that practitioners may apply to improve the management and the planning of software activities.
引用
收藏
页数:10
相关论文
共 50 条
  • [1] Characterizing Logging Practices in Open-Source Software
    Yuan, Ding
    Park, Soyeon
    Zhou, Yuanyuan
    [J]. 2012 34TH INTERNATIONAL CONFERENCE ON SOFTWARE ENGINEERING (ICSE), 2012, : 102 - 112
  • [2] Characterizing Technical Debt in Evolving Open-source Software
    Molnar, Arthur-Jozsef
    Motogna, Simona
    [J]. ENASE: PROCEEDINGS OF THE 17TH INTERNATIONAL CONFERENCE ON EVALUATION OF NOVEL APPROACHES TO SOFTWARE ENGINEERING, 2022, : 174 - 185
  • [3] MPTHub: An Open-Source Software for Characterizing the Transport of Particles in Biorelevant Media
    Gabriel, Leandro
    Almeida, Helena
    Avelar, Marta
    Sarmento, Bruno
    das Neves, Jose
    [J]. NANOMATERIALS, 2022, 12 (11)
  • [4] Characterizing the Occurrence of Dockerfile Smells in Open-Source Software: An Empirical Study
    Wu, Yiwen
    Zhang, Yang
    Wang, Tao
    Wang, Huaimin
    [J]. IEEE ACCESS, 2020, 8 : 34127 - 34139
  • [5] Open-source software - Introduction
    Sabbah, D
    Frye, D
    [J]. IBM SYSTEMS JOURNAL, 2005, 44 (02)
  • [6] Open-source bioinformatics software
    Vlagioiu, Constantin
    Vuta, Vlad
    Barbuceanu, Florica
    Predoi, Gabriel
    Tudor, Nicolae
    [J]. JOURNAL OF BIOTECHNOLOGY, 2017, 256 : S53 - S53
  • [7] Open-source software for repositories
    Vasilyeva, Natalya V.
    [J]. NAUCHNYE I TEKHNICHESKIE BIBLIOTEKI-SCIENTIFIC AND TECHNICAL LIBRARIES, 2023, (03): : 102 - 119
  • [8] Robust open-source software
    Neumann, PG
    [J]. COMMUNICATIONS OF THE ACM, 1999, 42 (02) : 128 - 128
  • [9] OPEN-SOURCE SOFTWARE IN ROBOTICS
    Timoftei, Sanda
    Brad, Emilia
    Sarb, Anca
    Stan, Ovidiu
    [J]. ACTA TECHNICA NAPOCENSIS SERIES-APPLIED MATHEMATICS MECHANICS AND ENGINEERING, 2018, 61 (03): : 519 - 526
  • [10] Characterizing Women (Not) Contributing To Open-Source
    Wurzelova, Pavlina
    Palomba, Fabio
    Bacchelli, Alberto
    [J]. 2019 IEEE/ACM 2ND INTERNATIONAL WORKSHOP ON GENDER EQUALITY IN SOFTWARE ENGINEERING (GE 2019), 2019, : 5 - 8