Just enough semantics: An information theoretic approach for IR-based software bug localization

被引:30
|
作者
Khatiwada, Saket [1 ]
Tushev, Miroslav [1 ]
Mahmoud, Anas [1 ]
机构
[1] Louisiana State Univ, Div Comp Sci & Engn, Baton Rouge, LA 70803 USA
关键词
Information retrieval; Bug localization; Information theory; LATENT DIRICHLET ALLOCATION; SOURCE-CODE; TRACEABILITY LINKS; RETRIEVAL; LOCATION; REPRESENTATIONS; DOCUMENTATION; COMPREHENSION; EVOLUTION; KNOWLEDGE;
D O I
10.1016/j.infsof.2017.08.012
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Context: Software systems are often shipped with defects. Whenever a bug is reported, developers use the information available in the associated report to locate source code fragments that need to be modified in order to fix the bug. However, as software systems evolve in size and complexity, bug localization can become a tedious and time-consuming process. To minimize the manual effort, contemporary bug localization tools utilize Information Retrieval (IR) methods for automated support. IR methods exploit the textual content of bug reports to automatically capture and rank relevant buggy source files. Objective: In this paper, we propose a new paradigm of information-theoretic IR methods to support bug localization tasks in software systems. These methods, including Pointwise Mutual Information (PMI) and Normalized Google Distance (NGD), exploit the co-occurrence patterns of code terms in the software system to reveal hidden textual semantic dimensions that other methods often fail to capture. Our objective is establish accurate semantic similarity relations between source code and bug reports. Method: Five benchmark datasets from different application domains are used to conduct our analysis. The proposed methods are compared against classical IR methods that are commonly used in bug localization research. Results: The results show that information-theoretic IR methods significantly outperform classical IR methods, providing a semantically aware, yet, computationally efficient solution for bug localization in large and complex software systems. (A replication package is available at: http://seel.cseisu.edu/datai istl7.zip). Conclusions: Information-theoretic co-occurrence methods provide "just enough semantics" necessary to establish relations between bug reports and code artifacts, achieving a balance between simple lexical methods and computationally-expensive semantic IR methods that require substantial amounts of data to function properly. (c) 2017 Elsevier B.V. All rights reserved.
引用
收藏
页码:45 / 57
页数:13
相关论文
共 50 条
  • [1] Influence of Structured Information in Bug Report Descriptions on IR-based Bug Localization
    Rath, Michael
    Maeder, Patrick
    [J]. 44TH EUROMICRO CONFERENCE ON SOFTWARE ENGINEERING AND ADVANCED APPLICATIONS (SEAA 2018), 2018, : 26 - 32
  • [2] Structured information in bug report descriptions—influence on IR-based bug localization and developers
    Michael Rath
    Patrick Mäder
    [J]. Software Quality Journal, 2019, 27 : 1315 - 1337
  • [3] A Novel Approach to Automatic Query Reformulation for IR-based Bug Localization
    Kim, Misoo
    Lee, Eunseok
    [J]. SAC '19: PROCEEDINGS OF THE 34TH ACM/SIGAPP SYMPOSIUM ON APPLIED COMPUTING, 2019, : 1752 - 1759
  • [4] An Empirical Study of IR-based Bug Localization for Deep Learning-based Software
    Kim, Misoo
    Kim, Youngkyoung
    Lee, Eunseok
    [J]. 2022 IEEE 15TH INTERNATIONAL CONFERENCE ON SOFTWARE TESTING, VERIFICATION AND VALIDATION (ICST 2022), 2022, : 128 - 139
  • [5] Structured information in bug report descriptions-influence on IR-based bug localization and developers
    Rath, Michael
    Maeder, Patrick
    [J]. SOFTWARE QUALITY JOURNAL, 2019, 27 (03) : 1315 - 1337
  • [6] Predicting Effectiveness of IR-Based Bug Localization Techniques
    Le, Tien-Duy B.
    Thung, Ferdian
    Lo, David
    [J]. 2014 IEEE 25TH INTERNATIONAL SYMPOSIUM ON SOFTWARE RELIABILITY ENGINEERING (ISSRE), 2014, : 335 - 345
  • [7] The forgotten role of search queries in IR-based bug localization: an empirical study
    Mohammad Masudur Rahman
    Foutse Khomh
    Shamima Yeasmin
    Chanchal K. Roy
    [J]. Empirical Software Engineering, 2021, 26
  • [8] Improving IR-Based Bug Localization with Context-Aware Query Reformulation
    Rahman, Mohammad Masudur
    Roy, Chanchal K.
    [J]. ESEC/FSE'18: PROCEEDINGS OF THE 2018 26TH ACM JOINT MEETING ON EUROPEAN SOFTWARE ENGINEERING CONFERENCE AND SYMPOSIUM ON THE FOUNDATIONS OF SOFTWARE ENGINEERING, 2018, : 621 - 632
  • [9] A Novel Automatic Query Expansion with Word Embedding for IR-based Bug Localization
    Kim, Misoo
    Kim, Youngkyoung
    Lee, Eunseok
    [J]. 2021 IEEE 32ND INTERNATIONAL SYMPOSIUM ON SOFTWARE RELIABILITY ENGINEERING (ISSRE 2021), 2021, : 276 - 287
  • [10] A Large-Scale Comparative Evaluation of IR-Based Tools for Bug Localization
    Akbar, Shayan A.
    Kak, Avinash C.
    [J]. 2020 IEEE/ACM 17TH INTERNATIONAL CONFERENCE ON MINING SOFTWARE REPOSITORIES, MSR, 2020, : 21 - 31