Misclassification Cost-Sensitive Software Defect Prediction

被引:4
|
作者
Xu, Ling [1 ,2 ]
Wang, Bei [1 ]
Liu, Ling [2 ]
Zhou, Mo [1 ]
Liao, Shengping [1 ]
Yan, Meng [3 ]
机构
[1] Chongqing Univ, Sch Software Engn, Chongqing, Peoples R China
[2] Georgia Inst Technol, Coll Comp, Atlanta, GA 30332 USA
[3] Zhejiang Univ, Coll Comp Sci & Technol, Hangzhou, Zhejiang, Peoples R China
基金
中国国家自然科学基金; 中国博士后科学基金;
关键词
software defect prediction; cost-sensitive; semi-supervised; unsupervised sampling;
D O I
10.1109/IRI.2018.00047
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Software defect prediction helps developers focus on defective modules for efficient software quality assurance. A common goal shared by existing software defect prediction methods is to attain low classification error rates. These proposals suffer from two practical problems: (i) Most of the prediction methods rely on a large number of labeled training data. However, collecting labeled data is a difficult and expensive task. It is hard to obtain classification labels over new software projects or existing projects without historical defect data. (ii) Software defect datasets are highly imbalanced. In many real-world applications, the misclassification cost of defective modules is generally several times higher than that of non-defective ones. In this paper, we present a misclassification Cost-sensitive approach to Software Defect Prediction (CSDP). The CSDP approach is novel in two aspects: First, CSDP addresses the problem of unlabeled software detect datasets by combining an unsupervised sampling method with a domain specific misclassification cost model. This preprocessing step selectively samples a small percentage of modules through estimating their classification labels. Second, CSDP builds a cost-sensitive support vector machine model to predict defect-proneness of the rest of modules with both overall classification error rate and domain specific misclassification cost as quality metrics. CSDP is evaluated on four NASA projects. Experimental results highlight three interesting observations: (1) CSDP achieves higher Normalized Expected Cost of Misclassification (NECM) compared with state-of-art supervised learning models under imbalanced training data with limited labeling. (2) CSDP outperforms state-of-art semi-supervised learning methods, which disregards classification costs, especially in recall rate. (3) CSDP enhanced through unsupervised sampling as a preprocessing step prior to training and prediction outperforms the baseline CSDP without the sampling process.
引用
收藏
页码:256 / 263
页数:8
相关论文
共 50 条
  • [21] A transfer cost-sensitive boosting approach for cross-project defect prediction
    Ryu, Duksan
    Jang, Jong-In
    Baik, Jongmoon
    SOFTWARE QUALITY JOURNAL, 2017, 25 (01) : 235 - 272
  • [22] A transfer cost-sensitive boosting approach for cross-project defect prediction
    Duksan Ryu
    Jong-In Jang
    Jongmoon Baik
    Software Quality Journal, 2017, 25 : 235 - 272
  • [23] Cost-sensitive transfer kernel canonical correlation analysis for heterogeneous defect prediction
    Li, Zhiqiang
    Jing, Xiao-Yuan
    Wu, Fei
    Zhu, Xiaoke
    Xu, Baowen
    Ying, Shi
    AUTOMATED SOFTWARE ENGINEERING, 2018, 25 (02) : 201 - 245
  • [24] Cost-sensitive business failure prediction when misclassification costs are uncertain: A heterogeneous ensemble selection approach
    De Bock, Koen W.
    Coussement, Kristof
    Lessmann, Stefan
    EUROPEAN JOURNAL OF OPERATIONAL RESEARCH, 2020, 285 (02) : 612 - 630
  • [25] Cost-sensitive and ensemble-based prediction model for outsourced software project risk prediction
    Hu, Yong
    Feng, Bin
    Mo, Xizhu
    Zhang, Xiangzhou
    Ngai, E. W. T.
    Fan, Ming
    Liu, Mei
    DECISION SUPPORT SYSTEMS, 2015, 72 : 11 - 23
  • [26] On the Effectiveness of Cost Sensitive Neural Networks for Software Defect Prediction
    Muthukumaran, K.
    Dasgupta, Amrita
    Abhidnya, Shirode
    Neti, Lalita Bhanu Murthy
    PROCEEDINGS OF THE EIGHTH INTERNATIONAL CONFERENCE ON SOFT COMPUTING AND PATTERN RECOGNITION (SOCPAR 2016), 2018, 614 : 557 - 570
  • [27] Cost Sensitive Decision Forest and Voting for Software Defect Prediction
    Siers, Michael J.
    Islam, Md Zahidul
    PRICAI 2014: TRENDS IN ARTIFICIAL INTELLIGENCE, 2014, 8862 : 929 - 936
  • [28] Cost-sensitive boosting in software quality modeling
    Khoshgoftaar, TM
    7TH IEEE INTERNATIONAL SYMPOSIUM ON HIGH ASSURANCE SYSTEMS ENGINEERING, PROCEEDINGS, 2002, : 51 - 60
  • [29] Predicting Software Defects: A Cost-Sensitive Approach
    Bezerra, Miguel E. R.
    Oliveira, Adriano L. I.
    Adeodato, Paulo J. L.
    2011 IEEE INTERNATIONAL CONFERENCE ON SYSTEMS, MAN, AND CYBERNETICS (SMC), 2011, : 2515 - 2522
  • [30] Improving Ranking-Oriented Defect Prediction Using a Cost-Sensitive Ranking SVM
    Yu, Xiao
    Liu, Jin
    Keung, Jacky Wai
    Li, Qing
    Bennin, Kwabena Ebo
    Xu, Zhou
    Wang, Junping
    Cui, Xiaohui
    IEEE TRANSACTIONS ON RELIABILITY, 2020, 69 (01) : 139 - 153