The forgotten role of search queries in IR-based bug localization: an empirical study

被引：0

作者：

Mohammad Masudur Rahman

Foutse Khomh

Shamima Yeasmin

Chanchal K. Roy

机构：

[1] Dalhousie University,

[2] Polytechnique Montréal,undefined

[3] University of Saskatchewan,undefined

来源：

Empirical Software Engineering | 2021年 / 26卷

关键词：

Debugging automation; Bug localization; Information retrieval; Natural language processing; Query construction; Keyword selection; Genetic algorithm; Optimal search query; Poor search query; Empirical study;

D O I：

暂无

中图分类号：

学科分类号：

摘要：

Being light-weight and cost-effective, IR-based approaches for bug localization have shown promise in finding software bugs. However, the accuracy of these approaches heavily depends on their used bug reports. A significant number of bug reports contain only plain natural language texts. According to existing studies, IR-based approaches cannot perform well when they use these bug reports as search queries. On the other hand, there is a piece of recent evidence that suggests that even these natural language-only reports contain enough good keywords that could help localize the bugs successfully. On one hand, these findings suggest that natural language-only bug reports might be a sufficient source for good query keywords. On the other hand, they cast serious doubt on the query selection practices in the IR-based bug localization. In this article, we attempted to clear the sky on this aspect by conducting an in-depth empirical study that critically examines the state-of-the-art query selection practices in IR-based bug localization. In particular, we use a dataset of 2,320 bug reports, employ ten existing approaches from the literature, exploit the Genetic Algorithm-based approach to construct optimal, near-optimal search queries from these bug reports, and then answer three research questions. We confirmed that the state-of-the-art query construction approaches are indeed not sufficient for constructing appropriate queries (for bug localization) from certain natural language-only bug reports. However, these bug reports indeed contain high-quality search keywords in their texts even though they might not contain explicit hints for localizing bugs (e.g., stack traces). We also demonstrate that optimal queries and non-optimal queries chosen from bug report texts are significantly different in terms of several keyword characteristics (e.g., frequency, entropy, position, part of speech). Such an analysis has led us to four actionable insights on how to choose appropriate keywords from a bug report. Furthermore, we demonstrate 27%–34% improvement in the performance of non-optimal queries through the application of our actionable insights to them. Finally, we summarize our study findings with future research directions (e.g., machine intelligence in keyword selection).

引用

共 50 条

[1] The forgotten role of search queries in IR-based bug localization: an empirical study
Rahman, Mohammad Masudur
Khomh, Foutse
Yeasmin, Shamima
Roy, Chanchal K.
[J]. EMPIRICAL SOFTWARE ENGINEERING, 2021, 26 (06)
[2] An Empirical Study of IR-based Bug Localization for Deep Learning-based Software
Kim, Misoo
Kim, Youngkyoung
Lee, Eunseok
[J]. 2022 IEEE 15TH INTERNATIONAL CONFERENCE ON SOFTWARE TESTING, VERIFICATION AND VALIDATION (ICST 2022), 2022, : 128 - 139
[3] An empirical study of the effectiveness of IR-based bug localization for large-scale industrial projects
Li, Wei
Li, Qingan
Ming, Yunlong
Dai, Weijiao
Ying, Shi
Yuan, Mengting
[J]. EMPIRICAL SOFTWARE ENGINEERING, 2022, 27 (02)
[4] An empirical study of the effectiveness of IR-based bug localization for large-scale industrial projects
Wei Li
Qingan Li
Yunlong Ming
Weijiao Dai
Shi Ying
Mengting Yuan
[J]. Empirical Software Engineering, 2022, 27
[5] Predicting Effectiveness of IR-Based Bug Localization Techniques
Le, Tien-Duy B.
Thung, Ferdian
Lo, David
[J]. 2014 IEEE 25TH INTERNATIONAL SYMPOSIUM ON SOFTWARE RELIABILITY ENGINEERING (ISSRE), 2014, : 335 - 345
[6] Influence of Structured Information in Bug Report Descriptions on IR-based Bug Localization
Rath, Michael
Maeder, Patrick
[J]. 44TH EUROMICRO CONFERENCE ON SOFTWARE ENGINEERING AND ADVANCED APPLICATIONS (SEAA 2018), 2018, : 26 - 32
[7] Structured information in bug report descriptions—influence on IR-based bug localization and developers
Michael Rath
Patrick Mäder
[J]. Software Quality Journal, 2019, 27 : 1315 - 1337
[8] Bench4BL: Reproducibility Study on the Performance of IR-Based Bug Localization
Lee, Jaekwon
Kim, Dongsun
Bissyande, Tegawende F.
Jung, Woosung
Le Traon, Yves
[J]. ISSTA'18: PROCEEDINGS OF THE 27TH ACM SIGSOFT INTERNATIONAL SYMPOSIUM ON SOFTWARE TESTING AND ANALYSIS, 2018, : 61 - 72
[9] A Novel Approach to Automatic Query Reformulation for IR-based Bug Localization
Kim, Misoo
Lee, Eunseok
[J]. SAC '19: PROCEEDINGS OF THE 34TH ACM/SIGAPP SYMPOSIUM ON APPLIED COMPUTING, 2019, : 1752 - 1759
[10] Structured information in bug report descriptions-influence on IR-based bug localization and developers
Rath, Michael
Maeder, Patrick
[J]. SOFTWARE QUALITY JOURNAL, 2019, 27 (03) : 1315 - 1337

← 1 2 3 4 5 →