The forgotten role of search queries in IR-based bug localization: an empirical study

被引:0
|
作者
Mohammad Masudur Rahman
Foutse Khomh
Shamima Yeasmin
Chanchal K. Roy
机构
[1] Dalhousie University,
[2] Polytechnique Montréal,undefined
[3] University of Saskatchewan,undefined
来源
关键词
Debugging automation; Bug localization; Information retrieval; Natural language processing; Query construction; Keyword selection; Genetic algorithm; Optimal search query; Poor search query; Empirical study;
D O I
暂无
中图分类号
学科分类号
摘要
Being light-weight and cost-effective, IR-based approaches for bug localization have shown promise in finding software bugs. However, the accuracy of these approaches heavily depends on their used bug reports. A significant number of bug reports contain only plain natural language texts. According to existing studies, IR-based approaches cannot perform well when they use these bug reports as search queries. On the other hand, there is a piece of recent evidence that suggests that even these natural language-only reports contain enough good keywords that could help localize the bugs successfully. On one hand, these findings suggest that natural language-only bug reports might be a sufficient source for good query keywords. On the other hand, they cast serious doubt on the query selection practices in the IR-based bug localization. In this article, we attempted to clear the sky on this aspect by conducting an in-depth empirical study that critically examines the state-of-the-art query selection practices in IR-based bug localization. In particular, we use a dataset of 2,320 bug reports, employ ten existing approaches from the literature, exploit the Genetic Algorithm-based approach to construct optimal, near-optimal search queries from these bug reports, and then answer three research questions. We confirmed that the state-of-the-art query construction approaches are indeed not sufficient for constructing appropriate queries (for bug localization) from certain natural language-only bug reports. However, these bug reports indeed contain high-quality search keywords in their texts even though they might not contain explicit hints for localizing bugs (e.g., stack traces). We also demonstrate that optimal queries and non-optimal queries chosen from bug report texts are significantly different in terms of several keyword characteristics (e.g., frequency, entropy, position, part of speech). Such an analysis has led us to four actionable insights on how to choose appropriate keywords from a bug report. Furthermore, we demonstrate 27%–34% improvement in the performance of non-optimal queries through the application of our actionable insights to them. Finally, we summarize our study findings with future research directions (e.g., machine intelligence in keyword selection).
引用
收藏
相关论文
共 50 条
  • [1] The forgotten role of search queries in IR-based bug localization: an empirical study
    Rahman, Mohammad Masudur
    Khomh, Foutse
    Yeasmin, Shamima
    Roy, Chanchal K.
    [J]. EMPIRICAL SOFTWARE ENGINEERING, 2021, 26 (06)
  • [2] An Empirical Study of IR-based Bug Localization for Deep Learning-based Software
    Kim, Misoo
    Kim, Youngkyoung
    Lee, Eunseok
    [J]. 2022 IEEE 15TH INTERNATIONAL CONFERENCE ON SOFTWARE TESTING, VERIFICATION AND VALIDATION (ICST 2022), 2022, : 128 - 139
  • [3] An empirical study of the effectiveness of IR-based bug localization for large-scale industrial projects
    Li, Wei
    Li, Qingan
    Ming, Yunlong
    Dai, Weijiao
    Ying, Shi
    Yuan, Mengting
    [J]. EMPIRICAL SOFTWARE ENGINEERING, 2022, 27 (02)
  • [4] An empirical study of the effectiveness of IR-based bug localization for large-scale industrial projects
    Wei Li
    Qingan Li
    Yunlong Ming
    Weijiao Dai
    Shi Ying
    Mengting Yuan
    [J]. Empirical Software Engineering, 2022, 27
  • [5] Predicting Effectiveness of IR-Based Bug Localization Techniques
    Le, Tien-Duy B.
    Thung, Ferdian
    Lo, David
    [J]. 2014 IEEE 25TH INTERNATIONAL SYMPOSIUM ON SOFTWARE RELIABILITY ENGINEERING (ISSRE), 2014, : 335 - 345
  • [6] Influence of Structured Information in Bug Report Descriptions on IR-based Bug Localization
    Rath, Michael
    Maeder, Patrick
    [J]. 44TH EUROMICRO CONFERENCE ON SOFTWARE ENGINEERING AND ADVANCED APPLICATIONS (SEAA 2018), 2018, : 26 - 32
  • [7] Structured information in bug report descriptions—influence on IR-based bug localization and developers
    Michael Rath
    Patrick Mäder
    [J]. Software Quality Journal, 2019, 27 : 1315 - 1337
  • [8] Bench4BL: Reproducibility Study on the Performance of IR-Based Bug Localization
    Lee, Jaekwon
    Kim, Dongsun
    Bissyande, Tegawende F.
    Jung, Woosung
    Le Traon, Yves
    [J]. ISSTA'18: PROCEEDINGS OF THE 27TH ACM SIGSOFT INTERNATIONAL SYMPOSIUM ON SOFTWARE TESTING AND ANALYSIS, 2018, : 61 - 72
  • [9] A Novel Approach to Automatic Query Reformulation for IR-based Bug Localization
    Kim, Misoo
    Lee, Eunseok
    [J]. SAC '19: PROCEEDINGS OF THE 34TH ACM/SIGAPP SYMPOSIUM ON APPLIED COMPUTING, 2019, : 1752 - 1759
  • [10] Structured information in bug report descriptions-influence on IR-based bug localization and developers
    Rath, Michael
    Maeder, Patrick
    [J]. SOFTWARE QUALITY JOURNAL, 2019, 27 (03) : 1315 - 1337