A training sample selection method for predicting software defects

被引:0
|
作者
Cong Jin
机构
[1] Central China Normal University,School of Computer
来源
Applied Intelligence | 2023年 / 53卷
关键词
Software defect prediction; Sample contribution; Sample selection; Predictive performance;
D O I
暂无
中图分类号
学科分类号
摘要
Software Defect Prediction (SDP) is an important method to analyze software quality and reduce development cost. Data from software life cycle has been widely used to predict the defect prone of software modules, and although many machine learning-based SDP models have been proposed, their predictive performance is not always satisfactory. Traditional machine learning-based classifiers usually assume that all samples have the same contribution to the training of SDP, which is not true. In fact, different training samples have different effects on the performance of the SDP model, the performance of machine learning-based SDP models is heavily dependent on the quality of training samples. For the above shortcoming of traditional machine learning-based classifiers, the contributions of this paper are as follows: (1) Inspired by the clustering algorithm, a method to calculate the contribution of each training sample to the SDP model is proposed, which not only considers the relationship between the contributions of the training samples to the SDP model, and also analyzes the influence of the distance between the sample and the category boundary on the performance of the SDP model, so it is different from the existing calculation method of sample contribution. (2) A Sample Selection (SS) method is proposed to improve the performance of the SDP model. It first calculates the contribution of each training sample based on several nearest neighbors of the sample and the label information of these neighbors, and then implements SS according to Hoeffding probability inequality and the contribution of each sample. To confirm the validity of the proposed SDP model, some experimental results are given. Both direct observations and statistical tests of the experimental results show that the SS method is very effective for improving the predictive performance of the SDP model.
引用
收藏
页码:12015 / 12031
页数:16
相关论文
共 50 条
  • [1] A training sample selection method for predicting software defects
    Jin, Cong
    [J]. APPLIED INTELLIGENCE, 2023, 53 (10) : 12015 - 12031
  • [2] A method for predicting open source software residual defects
    Ullah, Najeeb
    [J]. SOFTWARE QUALITY JOURNAL, 2015, 23 (01) : 55 - 76
  • [3] A method for predicting open source software residual defects
    Najeeb Ullah
    [J]. Software Quality Journal, 2015, 23 : 55 - 76
  • [4] Hybrid feature selection method for predicting software defect
    A. J. Anju
    J. E. Judith
    [J]. Journal of Engineering and Applied Science, 2024, 71 (1):
  • [5] Predicting software defects with causality tests
    Couto, Cesar
    Pires, Pedro
    Valente, Marco Tulio
    Bigonha, Roberto S.
    Anquetil, Nicolas
    [J]. JOURNAL OF SYSTEMS AND SOFTWARE, 2014, 93 : 24 - 41
  • [6] A Highly Efficient Method for Training Sample Selection in Remote Sensing Classification
    Yang, Chao
    Li, Qingquan
    Wu, Guofeng
    Chen, Junyi
    [J]. 2018 26TH INTERNATIONAL CONFERENCE ON GEOINFORMATICS (GEOINFORMATICS 2018), 2018,
  • [7] Predicting the number of defects remaining in operational software
    Hartman, PJ
    [J]. NAVAL ENGINEERS JOURNAL, 2001, 113 (01) : 23 - 32
  • [8] Predicting the number of defects in a new software version
    Felix, Ebubeogu Amarachukwu
    Lee, Sai Peck
    [J]. PLOS ONE, 2020, 15 (03):
  • [9] Predicting Software Defects with Explainable Machine Learning
    Santos, Geanderson
    Figueiredo, Eduardo
    Veloso, Adriano
    Viggiato, Markos
    Ziviani, Nivio
    [J]. PROCEEDINGS OF THE 19TH BRAZILIAN SYMPOSIUM ON SOFTWARE QUALITY, SBOS 2020, 2020,
  • [10] A Novel Training Sample Selection Method for STAP Based on Clutter Sparse Recovery
    Han, Sudan
    Fan, Chongyi
    Huang, Xiaotao
    [J]. 2016 PROGRESS IN ELECTROMAGNETICS RESEARCH SYMPOSIUM (PIERS), 2016, : 2275 - 2279