Domain Adaptation Approach for Cross-project Software Defect Prediction

被引：0

作者：

Chen S. ^{[1
]}

Ye J.-M. ^{[1
]}

Liu T. ^{[1
]}

机构：

[1] School of Computer, Central China Normal University, Wuhan

来源：

Ruan Jian Xue Bao/Journal of Software | 2020年 / 31卷 / 02期

关键词：

Domain adaptation; Machine learning; Software defect metrics; Software defect prediction; Transfer learning;

D O I：

10.13328/j.cnki.jos.005632

中图分类号：

学科分类号：

摘要：

Software defect prediction aims at the very early step of software quality control, helps software engineers focus their attention on defect-prone parts during verification process. Cross-project defect predictions are proposed in which prediction models are trained by using sufficient training data from already existed software projects and predict defect in some other projects, however, their performances are always poor. The main reason is that, the divergence of the data distribution among different software projects causes a dramatic impact on the prediction accuracy. This study proposed an approach of cross-project defect prediction by applying a supervised domain adaptation based on instance weighting. The sufficient instances drawn from some source project are weighted by assigning target-dependent weights to the loss function of the prediction model when minimizing the expected loss over the distribution of source data, so that the distribution properties of the data from target project can be matched to the source project. Experiments including dataset selection, data preprocessing and results are described over different experiment strategies on ten open-source software projects. Over fitting problems are also studied through different levels including dataset, prediction model and domain adaptation process. The results show that the proposed approach is close to the performance of within-project defect prediction, better than similar approach and significantly better that of the baseline. © Copyright 2020, Institute of Software, the Chinese Academy of Sciences. All rights reserved.

引用

页码：266 / 281

页数：15

共 43 条

[1] Hassan A.E., Predicting faults using the complexity of code changes, Proc. of the Int'l Conf. on Software Engineering, pp. 78-88, (2009)
[2] Kim S., Whitehead E.J., Zhang Y., Classifying software changes: Clean or buggy, IEEE Trans. on Software Engineering, 34, 2, pp. 181-196, (2008)
[3] Menzies T., Greenwald J., Frank A., Data mining static code attributes to learn defect predictors, IEEE Trans. on Software Engineering, 33, 1, pp. 2-13, (2007)
[4] Zimmermann T., Nagappan N., Predicting defects using network analysis on dependency graphs, Proc. of the Int'l Conf. on Software Engineering, pp. 531-540, (2008)
[5] Nagappan N., Ball T., Zeller A., Mining metrics to predict component failure, Proc. of the Int'l Conf. on Software Engineering, pp. 452-461, (2006)
[6] Chen X., Gu Q., Liu W.S., Liu S.L., Ni C., Software defect prediction, Ruan Jian Xue Bao/Journal of Software, 27, 1, pp. 1-25, (2016)
[7] Moser R., Pedrycz W., Succi G., A comparative analysis of the efficiency of change metrics and static code attributes for defect prediction, Proc. of the ICSE, pp. 181-190, (2008)
[8] Wu R., Zhang H., Kim S., Cheung S., Relink: Recovering links between bugs and changes, Proc. of the Joint Meeting of the European Software Engineering Conf. and the Symp. on the Foundations of Software Engineering, pp. 15-25, (2011)
[9] Lee T., Nam J., Han D., Kim S., Hoh I.P., Micro interaction metrics for defect prediction, Proc. of the Joint Meeting of the European Software Engineering Conf. and the Symp. on the Foundations of Software Engineering, pp. 311-321, (2011)
[10] Rahman F., Posnett D., Devanbu P., Recalling the imprecision of cross-project defect prediction, Proc. of the Int'l Symp. on the Foundations of Software Engineering, pp. 1-11, (2012)

← 1 2 3 4 5 →