Characterizing Commits in Open-Source Software

被引:0
|
作者
Ferreira, Mivian M. [1 ]
Goncalves, Diego Santos [2 ]
Bigonha, Mariza A. S. [1 ]
Ferreira, Kecia A. M. [2 ]
机构
[1] Univ Fed Minas Gerais, Belo Horizonte, MG, Brazil
[2] Fed Ctr Technol Educ Minas Gerais, Belo Horizonte, MG, Brazil
关键词
empirical study; commit; open-source; mining software repositories; !text type='Java']Java[!/text;
D O I
10.1145/3571473.3571508
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
Mining software repositories has been the basis of many studies on software engineering. Many of these works rely on commits' data extracted since commit is the basic unit of information about activities performed on the projects. However, not knowing the characteristics of commits may introduce biases and threats in studies that consider commits' data. This work presents an empirical study to characterize commits in terms of four aspects: the size of commits in the total number of files; the size of commits in the number of source-code files, the size of commits by category; and the time interval of commits performed by contributors. We analyzed 1M commits from the 24 most popular and active Java-based projects hosted on GitHub. The main findings of this work show that: the size of commits follows a heavy-tailed distribution; most commits involve one to 10 files; most commits affect one to four source-code files; the commits involving hundreds of files not only refer to merge or management activities; the distribution of the time intervals is approximately a Normal distribution, i.e., the distribution tends to be symmetric, and the mean is representative; in the average, a developer proceed a commit every eight hours. The results of this study should be considered by researchers in empirical works to avoid biases when analyzing commits' data. Besides, the results provide information that practitioners may apply to improve the management and the planning of software activities.
引用
收藏
页数:10
相关论文
共 50 条
  • [21] Open-Source Software for Agricultural Engineering
    Igathinathane, C.
    [J]. Resource: Engineering and Technology for Sustainable World, 2024, 31 (03): : 8 - 11
  • [22] Teaching Cryptography with Open-Source Software
    McAndrew, Alasdair
    [J]. SIGCSE'08: PROCEEDINGS OF THE 39TH ACM TECHNICAL SYMPOSIUM ON COMPUTER SCIENCE EDUCATION, 2008, : 325 - 329
  • [23] Open-source medical software on the net
    Sinclair, A
    [J]. CANADIAN MEDICAL ASSOCIATION JOURNAL, 2001, 165 (06) : 811 - 811
  • [24] Firms as Incubators of Open-Source Software
    Mehra, Amit
    Dewan, Rajiv
    Freimer, Marshall
    [J]. INFORMATION SYSTEMS RESEARCH, 2011, 22 (01) : 22 - 38
  • [25] Unlocked: embedding open-source software
    Webb, W
    [J]. EDN, 2003, 48 (11) : 40 - +
  • [26] goGPS: open-source MATLAB software
    Antonio M. Herrera
    Hendy F. Suhandri
    Eugenio Realini
    Mirko Reguzzoni
    M. Clara de Lacy
    [J]. GPS Solutions, 2016, 20 : 595 - 603
  • [27] Greenstone: Open-source DL software
    Witten, IH
    Bainbridge, D
    Boddie, S
    [J]. COMMUNICATIONS OF THE ACM, 2001, 44 (05) : 47 - 47
  • [28] Open-source software: not quite endsville
    Stahl, MT
    [J]. DRUG DISCOVERY TODAY, 2005, 10 (03) : 219 - 222
  • [29] Open-source software for SEM metrology
    Mochi, Iacopo
    Vockenhuber, Michaela
    Allenet, Timothee
    Ekinci, Yasin
    [J]. PHOTOMASK TECHNOLOGY 2020, 2020, 11518
  • [30] Open-source software for geospatial analysis
    Isamar M. Cortés
    [J]. Nature Reviews Earth & Environment, 2023, 4 : 143 - 143