Source code analysis with LDA

被引:5
|
作者
Binkley, David [1 ]
Heinz, Daniel [2 ]
Lawrie, Dawn [1 ]
Overfelt, Justin [3 ]
机构
[1] Loyola Univ Maryland, Comp Sci Dept, Baltimore, MD 21210 USA
[2] CNA Financial, Chicago, IL USA
[3] Booz Allen Hamilton, Linthicum, MD USA
关键词
latent Dirichlet allocation; hyper-parameters; entropy;
D O I
10.1002/smr.1802
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
Latent Dirichlet allocation (LDA) has seen increasing use in the understanding of source code and its related artifacts in part because of its impressive modeling power. However, this expressive power comes at a cost: The technique includes several tuning parameters whose impact on the resulting LDA model must be carefully considered. The aim of this work is to provide insights into the tuning parameters' impact. Doing so improves the comprehension of both researchers who look to exploit the power of LDA in their research and those who interpret the output of LDA-using tools. It is important to recognize that the goal of this work is not to establish values for the tuning parameters because there is no universal best setting. Rather, appropriate settings depend on the problem being solved, the input corpus (in this case, typically words from the source code and its supporting artifacts), and the needs of the engineer performing the analysis. This work's primary goal is to aid software engineers in their understanding of the LDA tuning parameters by demonstrating numerically and graphically the relationship between the tuning parameters and the LDA output. A secondary goal is to enable more informed setting of the parameters. Copyright (C) 2016 John Wiley & Sons, Ltd.
引用
收藏
页码:893 / 920
页数:28
相关论文
共 50 条
  • [1] Source Code Retrieval on StackOverflow Using LDA
    Arwan, Achmad
    Rochimah, Siti
    Akbar, Rizky Januar
    [J]. 2015 3RD INTERNATIONAL CONFERENCE ON INFORMATION AND COMMUNICATION TECHNOLOGY (ICOICT), 2015, : 295 - 299
  • [2] Source code analysis dataset
    Gelman, Ben
    Obayomi, Banjo
    Moore, Jessica
    Slater, David
    [J]. Data in Brief, 2019, 27
  • [3] An alternative source code analysis
    Kimble, JE
    White, LJ
    [J]. INTERNATIONAL CONFERENCE ON SOFTWARE ENGINEERING, PROCEEDINGS, 2000, : 64 - 75
  • [4] Source code analysis dataset
    Gelman, Ben
    Obayomi, Banjo
    Moore, Jessica
    Slater, David
    [J]. DATA IN BRIEF, 2019, 27
  • [5] Source code analysis and manipulation
    Harman, M
    Munro, M
    Hu, L
    Zhang, XY
    [J]. INFORMATION AND SOFTWARE TECHNOLOGY, 2002, 44 (13) : 717 - 720
  • [6] Source code analysis and manipulation
    Oliveto, Rocco
    Hindle, Abram
    Lawrie, Dawn J.
    [J]. JOURNAL OF SYSTEMS AND SOFTWARE, 2017, 129 : 58 - 59
  • [7] Source Code Analysis - An Overview
    Kirkov, Radoslav
    Agre, Gennady
    [J]. CYBERNETICS AND INFORMATION TECHNOLOGIES, 2010, 10 (02) : 60 - 77
  • [8] Executable source code and non-executable source code: analysis and relationships
    Robles, G
    Gonzalez-Barahona, JM
    [J]. FOURTH IEEE INTERNATIONAL WORKSHOP ON SOURCE CODE ANALYSIS AND MANIPULATION, PROCEEDINGS, 2004, : 149 - 157
  • [9] Source Code Analysis and Manipulation - Introduction
    Binkley, D
    Burd, L
    Harman, M
    Tonella, P
    [J]. SOFTWARE QUALITY JOURNAL, 2004, 12 (04) : 293 - 295
  • [10] Visual Analysis of Source Code Similarities
    Burch, Michael
    Strotzer, Julian
    Weiskopf, Daniel
    [J]. 2015 19TH INTERNATIONAL CONFERENCE ON INFORMATION VISUALISATION IV 2015, 2015, : 21 - 27