Bug localization using latent Dirichlet allocation

被引:219
|
作者
Lukins, Stacy K. [1 ]
Kraft, Nicholas A. [2 ]
Etzkorn, Letha H. [1 ]
机构
[1] Univ Alabama, Dept Comp Sci, Huntsville, AL 35899 USA
[2] Univ Alabama, Dept Comp Sci, Tuscaloosa, AL 35487 USA
基金
美国国家科学基金会;
关键词
Bug localization; Program comprehension; Latent Dirichlet allocation; Information retrieval; DESIGN INSTABILITY;
D O I
10.1016/j.infsof.2010.04.002
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Context: Some recent static techniques for automatic bug localization have been built around modern information retrieval (IR) models such as latent semantic indexing (LSI). Latent Dirichlet allocation (LDA) is a generative statistical model that has significant advantages, in modularity and extensibility, over both LSI and probabilistic LSI (pLSI). Moreover, LDA has been shown effective in topic model based information retrieval. In this paper, we present a static LDA-based technique for automatic bug localization and evaluate its effectiveness. Objective: We evaluate the accuracy and scalability of the LDA-based technique and investigate whether it is suitable for use with open-source software systems of varying size, including those developed using agile methods. Method: We present five case studies designed to determine the accuracy and scalability of the LDA-based technique, as well as its relationships to software system size and to source code stability. The studies examine over 300 bugs across more than 25 iterations of three software systems. Results: The results of the studies show that the LDA-based technique maintains sufficient accuracy across all bugs in a single iteration of a software system and is scalable to a large number of bugs across multiple revisions of two software systems. The results of the studies also indicate that the accuracy of the LDA-based technique is not affected by the size of the subject software system or by the stability of its source code base. Conclusion: We conclude that an effective static technique for automatic bug localization can be built around LDA. We also conclude that there is no significant relationship between the accuracy of the LDA-based technique and the size of the subject software system or the stability of its source code base. Thus, the LDA-based technique is widely applicable. (C) 2010 Elsevier B.V. All rights reserved.
引用
收藏
页码:972 / 990
页数:19
相关论文
共 50 条
  • [1] Source Code Retrieval for Bug Localization using Latent Dirichlet Allocation
    Lukins, Stacy K.
    Kraft, Nicholas A.
    Etzkorn, Letha H.
    [J]. FIFTEENTH WORKING CONFERENCE ON REVERSE ENGINEERING, PROCEEDINGS, 2008, : 155 - 164
  • [2] Unsupervised Object Localization with Latent Dirichlet Allocation
    Yang, Tong-feng
    Ma, Jun
    [J]. 2013 INTERNATIONAL CONFERENCE ON COMPUTER SCIENCE AND ARTIFICIAL INTELLIGENCE (ICCSAI 2013), 2013, : 230 - 234
  • [3] Comparing Hierarchical Dirichlet Process with Latent Dirichlet Allocation in Bug Report Multiclass Classification
    Limsettho, Nachai
    Hata, Hideaki
    Matsumoto, Ken-ichi
    [J]. 2014 15TH IEEE/ACIS INTERNATIONAL CONFERENCE ON SOFTWARE ENGINEERING, ARTIFICIAL INTELLIGENCE, NETWORKING AND PARALLEL/DISTRIBUTED COMPUTING (SNPD), 2014, : 137 - 142
  • [4] On the Effectiveness of Labeled Latent Dirichlet Allocation in Automatic Bug-Report Categorization
    Zibran, Minhaz F.
    [J]. 2016 IEEE/ACM 38TH INTERNATIONAL CONFERENCE ON SOFTWARE ENGINEERING COMPANION (ICSE-C), 2016, : 713 - 715
  • [5] Latent Dirichlet allocation
    Blei, DM
    Ng, AY
    Jordan, MI
    [J]. JOURNAL OF MACHINE LEARNING RESEARCH, 2003, 3 (4-5) : 993 - 1022
  • [6] Latent Dirichlet allocation
    Blei, DM
    Ng, AY
    Jordan, MI
    [J]. ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 14, VOLS 1 AND 2, 2002, 14 : 601 - 608
  • [7] Author Identification Using Latent Dirichlet Allocation
    Calvo, Hiram
    Hernandez-Castaneda, Angel
    Garcia-Flores, Jorge
    [J]. COMPUTATIONAL LINGUISTICS AND INTELLIGENT TEXT PROCESSING, CICLING 2017, PT II, 2018, 10762 : 303 - 312
  • [8] A PERCEPTUAL HASHING ALGORITHM USING LATENT DIRICHLET ALLOCATION
    Vretos, Nicholas
    Nikolaidis, Nikos
    Pitas, Ioannis
    [J]. ICME: 2009 IEEE INTERNATIONAL CONFERENCE ON MULTIMEDIA AND EXPO, VOLS 1-3, 2009, : 362 - 365
  • [9] Topic Modeling Using Latent Dirichlet allocation: A Survey
    Chauhan, Uttam
    Shah, Apurva
    [J]. ACM COMPUTING SURVEYS, 2021, 54 (07)
  • [10] Using Latent Dirichlet Allocation for Automatic Categorization of Software
    Tian, Kai
    Revelle, Meghan
    Poshyvanyk, Denys
    [J]. 2009 6TH IEEE INTERNATIONAL WORKING CONFERENCE ON MINING SOFTWARE REPOSITORIES, 2009, : 163 - 166