Cliff Walls: An Analysis of Monolithic Commits Using Latent Dirichlet Allocation

被引:0
|
作者
Pratt, Landon J. [1 ]
MacLean, Alexander C. [1 ]
Knutson, Charles D. [1 ]
Ringger, Eric K. [1 ]
机构
[1] Brigham Young Univ, Dept Comp Sci, Provo, UT 84602 USA
来源
关键词
D O I
暂无
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Artifact-based research provides a mechanism whereby researchers may study the creation of software yet avoid many of the difficulties of direct observation and experimentation. However, there are still many challenges that can affect the quality of artifact-based studies, especially those studies examining software evolution. Large commits, which we refer to as "Cliff Walls," are one significant threat to studies of software evolution because they do not appear to represent incremental development. We used Latent Dirichlet Allocation to extract topics from over 2 million commit log messages, taken from 10,000 Source Forge projects. The topics generated through this method were then analyzed to determine the causes of over 9,000 of the largest commits. We found that branch merges, code imports, and auto-generated documentation were significant causes of large commits. We also found that corrective maintenance tasks, such as bug fixes, did not play a significant role in the creation of large commits.
引用
收藏
页码:282 / 298
页数:17
相关论文
共 50 条
  • [1] Bibliometric Analysis of Latent Dirichlet Allocation
    Garg, Mohit
    Rangra, Priya
    [J]. DESIDOC JOURNAL OF LIBRARY & INFORMATION TECHNOLOGY, 2022, 42 (02): : 105 - 113
  • [2] Topic Modeling Twitter Data Using Latent Dirichlet Allocation and Latent Semantic Analysis
    Qomariyah, Siti
    Iriawan, Nur
    Fithriasari, Kartika
    [J]. 2ND INTERNATIONAL CONFERENCE ON SCIENCE, MATHEMATICS, ENVIRONMENT, AND EDUCATION, 2019, 2019, 2194
  • [3] Latent Dirichlet allocation
    Blei, DM
    Ng, AY
    Jordan, MI
    [J]. JOURNAL OF MACHINE LEARNING RESEARCH, 2003, 3 (4-5) : 993 - 1022
  • [4] Bug localization using latent Dirichlet allocation
    Lukins, Stacy K.
    Kraft, Nicholas A.
    Etzkorn, Letha H.
    [J]. INFORMATION AND SOFTWARE TECHNOLOGY, 2010, 52 (09) : 972 - 990
  • [5] Latent Dirichlet allocation
    Blei, DM
    Ng, AY
    Jordan, MI
    [J]. ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 14, VOLS 1 AND 2, 2002, 14 : 601 - 608
  • [6] Author Identification Using Latent Dirichlet Allocation
    Calvo, Hiram
    Hernandez-Castaneda, Angel
    Garcia-Flores, Jorge
    [J]. COMPUTATIONAL LINGUISTICS AND INTELLIGENT TEXT PROCESSING, CICLING 2017, PT II, 2018, 10762 : 303 - 312
  • [7] Analysis of Research Trends in Fractional Controller Using Latent Dirichlet Allocation
    Shah, Pritesh
    Sharma, Deepak
    Sekhar, Ravi
    [J]. ENGINEERING LETTERS, 2021, 29 (01) : 109 - 119
  • [8] Analysis of the Trends in Biochemical Research Using Latent Dirichlet Allocation (LDA)
    Kang, Hee Jay
    Kim, Changhee
    Kang, Kyungtae
    [J]. PROCESSES, 2019, 7 (06):
  • [9] Enriched Latent Dirichlet Allocation for Sentiment Analysis
    Osmani, Amjad
    Mohasefi, Jamshid Bagherzadeh
    Gharehchopogh, Farhad Soleimanian
    [J]. EXPERT SYSTEMS, 2020, 37 (04)
  • [10] Tweet Sentiment Analysis with Latent Dirichlet Allocation
    Ohmura, Masahiro
    Kakusho, Koh
    Okadome, Takeshi
    [J]. INTERNATIONAL JOURNAL OF INFORMATION RETRIEVAL RESEARCH, 2014, 4 (03) : 66 - 79