Domain Adaptation Approach for Cross-project Software Defect Prediction

被引:0
|
作者
Chen S. [1 ]
Ye J.-M. [1 ]
Liu T. [1 ]
机构
[1] School of Computer, Central China Normal University, Wuhan
来源
Ruan Jian Xue Bao/Journal of Software | 2020年 / 31卷 / 02期
关键词
Domain adaptation; Machine learning; Software defect metrics; Software defect prediction; Transfer learning;
D O I
10.13328/j.cnki.jos.005632
中图分类号
学科分类号
摘要
Software defect prediction aims at the very early step of software quality control, helps software engineers focus their attention on defect-prone parts during verification process. Cross-project defect predictions are proposed in which prediction models are trained by using sufficient training data from already existed software projects and predict defect in some other projects, however, their performances are always poor. The main reason is that, the divergence of the data distribution among different software projects causes a dramatic impact on the prediction accuracy. This study proposed an approach of cross-project defect prediction by applying a supervised domain adaptation based on instance weighting. The sufficient instances drawn from some source project are weighted by assigning target-dependent weights to the loss function of the prediction model when minimizing the expected loss over the distribution of source data, so that the distribution properties of the data from target project can be matched to the source project. Experiments including dataset selection, data preprocessing and results are described over different experiment strategies on ten open-source software projects. Over fitting problems are also studied through different levels including dataset, prediction model and domain adaptation process. The results show that the proposed approach is close to the performance of within-project defect prediction, better than similar approach and significantly better that of the baseline. © Copyright 2020, Institute of Software, the Chinese Academy of Sciences. All rights reserved.
引用
收藏
页码:266 / 281
页数:15
相关论文
共 43 条
  • [1] Hassan A.E., Predicting faults using the complexity of code changes, Proc. of the Int'l Conf. on Software Engineering, pp. 78-88, (2009)
  • [2] Kim S., Whitehead E.J., Zhang Y., Classifying software changes: Clean or buggy, IEEE Trans. on Software Engineering, 34, 2, pp. 181-196, (2008)
  • [3] Menzies T., Greenwald J., Frank A., Data mining static code attributes to learn defect predictors, IEEE Trans. on Software Engineering, 33, 1, pp. 2-13, (2007)
  • [4] Zimmermann T., Nagappan N., Predicting defects using network analysis on dependency graphs, Proc. of the Int'l Conf. on Software Engineering, pp. 531-540, (2008)
  • [5] Nagappan N., Ball T., Zeller A., Mining metrics to predict component failure, Proc. of the Int'l Conf. on Software Engineering, pp. 452-461, (2006)
  • [6] Chen X., Gu Q., Liu W.S., Liu S.L., Ni C., Software defect prediction, Ruan Jian Xue Bao/Journal of Software, 27, 1, pp. 1-25, (2016)
  • [7] Moser R., Pedrycz W., Succi G., A comparative analysis of the efficiency of change metrics and static code attributes for defect prediction, Proc. of the ICSE, pp. 181-190, (2008)
  • [8] Wu R., Zhang H., Kim S., Cheung S., Relink: Recovering links between bugs and changes, Proc. of the Joint Meeting of the European Software Engineering Conf. and the Symp. on the Foundations of Software Engineering, pp. 15-25, (2011)
  • [9] Lee T., Nam J., Han D., Kim S., Hoh I.P., Micro interaction metrics for defect prediction, Proc. of the Joint Meeting of the European Software Engineering Conf. and the Symp. on the Foundations of Software Engineering, pp. 311-321, (2011)
  • [10] Rahman F., Posnett D., Devanbu P., Recalling the imprecision of cross-project defect prediction, Proc. of the Int'l Symp. on the Foundations of Software Engineering, pp. 1-11, (2012)