Configuring latent Dirichlet allocation based feature location

被引:57
|
作者
Biggers, Lauren R. [1 ]
Bocovich, Cecylia [2 ]
Capshaw, Riley [3 ]
Eddy, Brian P. [1 ]
Etzkorn, Letha H. [4 ]
Kraft, Nicholas A. [1 ]
机构
[1] Univ Alabama, Dept Comp Sci, Tuscaloosa, AL 35487 USA
[2] Macalester Coll, Dept Math Stat & Comp Sci, St Paul, MN 55105 USA
[3] Hendrix Coll, Dept Math & Comp Sci, Conway, AR USA
[4] Univ Alabama, Dept Comp Sci, Huntsville, AL 35899 USA
基金
美国国家科学基金会;
关键词
Software evolution; Program comprehension; Feature location; Static analysis; Text retrieval; CODE; RETRIEVAL; COHESION;
D O I
10.1007/s10664-012-9224-x
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
Feature location is a program comprehension activity, the goal of which is to identify source code entities that implement a functionality. Recent feature location techniques apply text retrieval models such as latent Dirichlet allocation (LDA) to corpora built from text embedded in source code. These techniques are highly configurable, and the literature offers little insight into how different configurations affect their performance. In this paper we present a study of an LDA based feature location technique (FLT) in which we measure the performance effects of using different configurations to index corpora and to retrieve 618 features from 6 open source Java systems. In particular, we measure the effects of the query, the text extractor configuration, and the LDA parameter values on the accuracy of the LDA based FLT. Our key findings are that exclusion of comments and literals from the corpus lowers accuracy and that heuristics for selecting LDA parameter values in the natural language context are suboptimal in the source code context. Based on the results of our case study, we offer specific recommendations for configuring the LDA based FLT.
引用
收藏
页码:465 / 500
页数:36
相关论文
共 50 条
  • [21] A new ranking method based on latent dirichlet allocation
    Zhang, Maoyuan
    Luo, Chao
    Journal of Computational Information Systems, 2012, 8 (24): : 10141 - 10148
  • [22] Using Hierarchical Latent Dirichlet Allocation to Construct Feature Tree for Program Comprehension
    Sun, Xiaobing
    Liu, Xiangyue
    Duan, Yucong
    Li, Bin
    SCIENTIFIC PROGRAMMING, 2017, 2017
  • [23] Confidence measure for speech indexing based on Latent Dirichlet Allocation
    Senay, Gregory
    Linares, Georges
    13TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2012 (INTERSPEECH 2012), VOLS 1-3, 2012, : 2299 - 2302
  • [24] Image hierarchical representations models based on latent dirichlet allocation
    Wang, Fushun
    Li, Yan
    Sun, Xiaohua
    Cai, Zhenjiang
    Journal of Multimedia, 2013, 8 (04): : 358 - 364
  • [25] Tourist Routs Recommendation Based on Latent Dirichlet Allocation Model
    He, Zhiqiang
    Wu, Zhongyi
    Zhou, Bochong
    Xu, Lei
    Zhang, Weifeng
    2015 12TH WEB INFORMATION SYSTEM AND APPLICATION CONFERENCE (WISA), 2015, : 201 - 206
  • [26] An Ontology Term Extracting Method Based on Latent Dirichlet Allocation
    Yu Jing
    Wang Junli
    Zhao Xiaodong
    2012 FOURTH INTERNATIONAL CONFERENCE ON MULTIMEDIA INFORMATION NETWORKING AND SECURITY (MINES 2012), 2012, : 366 - 369
  • [27] Breast Histopathological Image Retrieval Based on Latent Dirichlet Allocation
    Ma, Yibing
    Jiang, Zhiguo
    Zhang, Haopeng
    Xie, Fengying
    Zheng, Yushan
    Shi, Huaqiang
    Zhao, Yu
    IEEE JOURNAL OF BIOMEDICAL AND HEALTH INFORMATICS, 2017, 21 (04) : 1114 - 1123
  • [28] FLDA: Latent Dirichlet Allocation Based Unsteady Flow Analysis
    Hong, Fan
    Lai, Chufan
    Guo, Hanqi
    Shen, Enya
    Yuan, Xiaoru
    Li, Sikun
    IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS, 2014, 20 (12) : 2545 - 2554
  • [29] Diverse reports recommendation system based on latent Dirichlet allocation
    Uto M.
    Louvigné S.
    Kato Y.
    Ishii T.
    Miyazawa Y.
    Behaviormetrika, 2017, 44 (2) : 425 - 444
  • [30] Classification of Indonesian News Articles based on Latent Dirichlet Allocation
    Kusumaningrum, Retno
    Adhy, Satriyo
    Wiedjayanto, M. Ihsan Aji
    Suryono
    PROCEEDINGS OF 2016 INTERNATIONAL CONFERENCE ON DATA AND SOFTWARE ENGINEERING (ICODSE), 2016,