Configuring latent Dirichlet allocation based feature location

被引:57
|
作者
Biggers, Lauren R. [1 ]
Bocovich, Cecylia [2 ]
Capshaw, Riley [3 ]
Eddy, Brian P. [1 ]
Etzkorn, Letha H. [4 ]
Kraft, Nicholas A. [1 ]
机构
[1] Univ Alabama, Dept Comp Sci, Tuscaloosa, AL 35487 USA
[2] Macalester Coll, Dept Math Stat & Comp Sci, St Paul, MN 55105 USA
[3] Hendrix Coll, Dept Math & Comp Sci, Conway, AR USA
[4] Univ Alabama, Dept Comp Sci, Huntsville, AL 35899 USA
基金
美国国家科学基金会;
关键词
Software evolution; Program comprehension; Feature location; Static analysis; Text retrieval; CODE; RETRIEVAL; COHESION;
D O I
10.1007/s10664-012-9224-x
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
Feature location is a program comprehension activity, the goal of which is to identify source code entities that implement a functionality. Recent feature location techniques apply text retrieval models such as latent Dirichlet allocation (LDA) to corpora built from text embedded in source code. These techniques are highly configurable, and the literature offers little insight into how different configurations affect their performance. In this paper we present a study of an LDA based feature location technique (FLT) in which we measure the performance effects of using different configurations to index corpora and to retrieve 618 features from 6 open source Java systems. In particular, we measure the effects of the query, the text extractor configuration, and the LDA parameter values on the accuracy of the LDA based FLT. Our key findings are that exclusion of comments and literals from the corpus lowers accuracy and that heuristics for selecting LDA parameter values in the natural language context are suboptimal in the source code context. Based on the results of our case study, we offer specific recommendations for configuring the LDA based FLT.
引用
收藏
页码:465 / 500
页数:36
相关论文
共 50 条
  • [1] Configuring latent Dirichlet allocation based feature location
    Lauren R. Biggers
    Cecylia Bocovich
    Riley Capshaw
    Brian P. Eddy
    Letha H. Etzkorn
    Nicholas A. Kraft
    Empirical Software Engineering, 2014, 19 : 465 - 500
  • [2] Impact of structural weighting on a latent Dirichlet allocation-based feature location technique
    Eddy, Brian P.
    Kraft, Nicholas A.
    Gray, Jeff
    JOURNAL OF SOFTWARE-EVOLUTION AND PROCESS, 2018, 30 (01)
  • [3] The Effects of Identifier Retention and Stop Word Removal on a Latent Dirichlet Allocation Based Feature Location Technique
    Biggers, Lauren R.
    PROCEEDINGS OF THE 50TH ANNUAL ASSOCIATION FOR COMPUTING MACHINERY SOUTHEAST CONFERENCE, 2012,
  • [4] Unsupervised Feature Selection for Latent Dirichlet Allocation
    Xu Weiran
    Du Gang
    Chen Guang
    Guo Jun
    Yang Jie
    CHINA COMMUNICATIONS, 2011, 8 (05) : 54 - 62
  • [5] Aurora Image Classification Based on Multi-Feature Latent Dirichlet Allocation
    Zhong, Yanfei
    Huang, Rui
    Zhao, Ji
    Zhao, Bei
    Liu, Tingting
    REMOTE SENSING, 2018, 10 (02)
  • [6] Feature extraction for document text using Latent Dirichlet Allocation
    Prihatini, P. M.
    Suryawan, I. K.
    Mandia, I. N.
    2ND INTERNATIONAL JOINT CONFERENCE ON SCIENCE AND TECHNOLOGY (IJCST) 2017, 2018, 953
  • [7] Feature Substitution Using Latent Dirichlet Allocation for Text Classification
    Mathivanan, Norsyela Muhammad Noor
    Janor, Roziah Mohd
    Abd Razak, Shukor
    Ghani, Nor Azura Md.
    INTERNATIONAL JOURNAL OF ADVANCED COMPUTER SCIENCE AND APPLICATIONS, 2025, 16 (01) : 1087 - 1098
  • [8] Latent Dirichlet allocation
    Blei, DM
    Ng, AY
    Jordan, MI
    JOURNAL OF MACHINE LEARNING RESEARCH, 2003, 3 (4-5) : 993 - 1022
  • [9] Latent Dirichlet allocation
    Blei, DM
    Ng, AY
    Jordan, MI
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 14, VOLS 1 AND 2, 2002, 14 : 601 - 608
  • [10] Latent Dirichlet Allocation Based Multilevel Classification
    Bhutada, Sunil
    Balaram, V. V. S. S. S.
    Bulusu, Vishnu Vardhan
    2014 INTERNATIONAL CONFERENCE ON CONTROL, INSTRUMENTATION, COMMUNICATION AND COMPUTATIONAL TECHNOLOGIES (ICCICCT), 2014, : 1020 - 1024