An Empirical Study on the Stability of Explainable Software Defect Prediction

被引：1

作者：

Shin, Jiho ^{[1
]}

Aleithan, Reem ^{[1
]}

Nam, Jaechang ^{[2
]}

Wang, Junjie ^{[3
]}

Harzevili, Nima Shiri ^{[1
]}

Wang, Song ^{[1
]}

机构：

[1] York Univ, Toronto, ON, Canada

[2] Handong Global Univ, Pohang, South Korea

[3] Chinese Acad Sci, Inst Software, Beijing, Peoples R China

来源：

PROCEEDINGS OF THE 2023 30TH ASIA-PACIFIC SOFTWARE ENGINEERING CONFERENCE, APSEC 2023 | 2023年

关键词：

Software bugs; static detection; machine learning libraries; FAULTS; MODELS;

D O I：

10.1109/APSEC60848.2023.00024

中图分类号：

TP31 [计算机软件];

学科分类号：

081202 ; 0835 ;

摘要：

Explaining the results of software defect prediction (SDP) models is practical but challenging. Jiarpakdee et al. proposed using two model-agnostic techniques (i.e., LIME and BreakDown) to explain prediction results. They showed that model-agnostic techniques can achieve remarkable performance and that the generated explanations can assist developers in understanding the prediction results. However, the fact that they examined these model-agnostic techniques only under a specific SDP setting calls into question their reliability on SDP models under various settings. In this paper, we set out to investigate the reliability and stability of model-agnostic-based explanation generation approaches on SDP models under different settings, e.g., different data sampling techniques, machine learning classifiers, and prediction scenarios used when building SDP models. We use model-agnostic techniques to generate explanations for the same instance under various SDP models with different settings and then check the stability of the generated explanations for the instance. We reused the same defect data and experiment configurations from Jiarpakdee et al. in our experiments. The results show that the examined model-agnostic techniques generate inconsistent explanations under different SDP settings for the same test instances. Our user case study further confirms that inconsistent explanations can significantly affect developers' understanding of the prediction results, which implies that the model-agnostic techniques can be unreliable for practical explanation generation under different SDP settings. To conclude, we urge a revisit of existing model-agnostic-based studies in software engineering and call for more research in explainable SDP toward achieving stable explanation generation.

引用

页码：141 / 150

页数：10

共 50 条

[1] Empirical Study of Software Defect Prediction: A Systematic Mapping
Le Hoang Son
Pritam, Nakul
Khari, Manju
Kumar, Raghvendra
Pham Thi Minh Phuong
Pham Huy Thong
SYMMETRY-BASEL, 2019, 11 (02):
[2] An Empirical Study on Software Defect Prediction Using CodeBERT Model
Pan, Cong
Lu, Minyan
Xu, Biao
APPLIED SCIENCES-BASEL, 2021, 11 (11):
[3] An empirical study on software defect prediction with a simplified metric set
He, Peng
Li, Bing
Liu, Xiao
Chen, Jun
Ma, Yutao
INFORMATION AND SOFTWARE TECHNOLOGY, 2015, 59 : 170 - 190
[4] An Empirical Study on Regression Techniques for Software Defect Number Prediction
Wang, Shihan
He, Yuxin
Shi, Rongrong
Jing, Chiyuan
Liu, Ying
Tong, Haonan
PROCEEDINGS OF THE 2023 30TH ASIA-PACIFIC SOFTWARE ENGINEERING CONFERENCE, APSEC 2023, 2023, : 637 - 638
[5] An Empirical Study on Software Defect Prediction using Function Point Analysis
Zhao, Xinghan
Tian, Cong
2022 IEEE 22ND INTERNATIONAL CONFERENCE ON SOFTWARE QUALITY, RELIABILITY AND SECURITY, QRS, 2022, : 167 - 176
[6] Empirical Study: How Issue Classification Influences Software Defect Prediction
Afric, Petar
Vukadin, Davor
Silic, Marin
Delac, Goran
IEEE ACCESS, 2023, 11 : 11732 - 11748
[7] Exploring better alternatives to size metrics for explainable software defect prediction
Chai, Chenchen
Fan, Guisheng
Yu, Huiqun
Huang, Zijie
Ding, Jianshu
Guan, Yao
SOFTWARE QUALITY JOURNAL, 2024, 32 (02) : 459 - 486
[8] The Performance Stability of Defect Prediction Models with Class Imbalance: An Empirical Study
Yu, Qiao
Jiang, Shujuan
Zhang, Yanmei
IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, 2017, E100D (02) : 265 - 272
[9] The effect of the dataset size on the accuracy of software defect prediction models: An empirical study
Alshammari, Mashaan A.
Alshayeb, Mohammad
Inteligencia Artificial, 2021, 24 (68) : 72 - 88
[10] The Effect of the Dataset Size on the Accuracy of Software Defect Prediction Models: An Empirical Study
Alshammari, Mashaan A.
Alshayeb, Mohammad
INTELIGENCIA ARTIFICIAL-IBEROAMERICAL JOURNAL OF ARTIFICIAL INTELLIGENCE, 2021, 24 (68): : 72 - 88

← 1 2 3 4 5 →