Misclassification Cost-Sensitive Software Defect Prediction

被引:4
|
作者
Xu, Ling [1 ,2 ]
Wang, Bei [1 ]
Liu, Ling [2 ]
Zhou, Mo [1 ]
Liao, Shengping [1 ]
Yan, Meng [3 ]
机构
[1] Chongqing Univ, Sch Software Engn, Chongqing, Peoples R China
[2] Georgia Inst Technol, Coll Comp, Atlanta, GA 30332 USA
[3] Zhejiang Univ, Coll Comp Sci & Technol, Hangzhou, Zhejiang, Peoples R China
基金
中国国家自然科学基金; 中国博士后科学基金;
关键词
software defect prediction; cost-sensitive; semi-supervised; unsupervised sampling;
D O I
10.1109/IRI.2018.00047
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Software defect prediction helps developers focus on defective modules for efficient software quality assurance. A common goal shared by existing software defect prediction methods is to attain low classification error rates. These proposals suffer from two practical problems: (i) Most of the prediction methods rely on a large number of labeled training data. However, collecting labeled data is a difficult and expensive task. It is hard to obtain classification labels over new software projects or existing projects without historical defect data. (ii) Software defect datasets are highly imbalanced. In many real-world applications, the misclassification cost of defective modules is generally several times higher than that of non-defective ones. In this paper, we present a misclassification Cost-sensitive approach to Software Defect Prediction (CSDP). The CSDP approach is novel in two aspects: First, CSDP addresses the problem of unlabeled software detect datasets by combining an unsupervised sampling method with a domain specific misclassification cost model. This preprocessing step selectively samples a small percentage of modules through estimating their classification labels. Second, CSDP builds a cost-sensitive support vector machine model to predict defect-proneness of the rest of modules with both overall classification error rate and domain specific misclassification cost as quality metrics. CSDP is evaluated on four NASA projects. Experimental results highlight three interesting observations: (1) CSDP achieves higher Normalized Expected Cost of Misclassification (NECM) compared with state-of-art supervised learning models under imbalanced training data with limited labeling. (2) CSDP outperforms state-of-art semi-supervised learning methods, which disregards classification costs, especially in recall rate. (3) CSDP enhanced through unsupervised sampling as a preprocessing step prior to training and prediction outperforms the baseline CSDP without the sampling process.
引用
收藏
页码:256 / 263
页数:8
相关论文
共 50 条
  • [31] The use of decision trees for cost-sensitive classification: an empirical study in software quality prediction
    Seliya, Naeem
    Khoshgoftaar, Taghi M.
    WILEY INTERDISCIPLINARY REVIEWS-DATA MINING AND KNOWLEDGE DISCOVERY, 2011, 1 (05) : 448 - 459
  • [32] Cost-sensitive deep forest for price prediction
    Ma, Chao
    Liu, Zhenbing
    Cao, Zhiguang
    Song, Wen
    Zhang, Jie
    Zeng, Weiliang
    PATTERN RECOGNITION, 2020, 107
  • [33] Cost-Sensitive Siamese Network for PCB Defect Classification
    Miao, Yilin
    Liu, Zhewei
    Wu, Xiangning
    Gao, Jie
    COMPUTATIONAL INTELLIGENCE AND NEUROSCIENCE, 2021, 2021 (2021)
  • [34] Heterogeneous fault prediction with cost-sensitive domain adaptation
    Li, Zhiqiang
    Jing, Xiao-Yuan
    Zhu, Xiaoke
    SOFTWARE TESTING VERIFICATION & RELIABILITY, 2018, 28 (02):
  • [35] Classification of tracheal stenosis with asymmetric misclassification errors from EMG an cost-sensitive method
    Volk, Ohad
    Ratnovsky, Anat
    Naftali, Sara
    Singer, Gonen
    BIOMEDICAL SIGNAL PROCESSING AND CONTROL, 2023, 85
  • [36] Power Transformer Health Index Using Cost-Sensitive Learning to Consider the Impact of Misclassification
    Che, Junsoo
    Park, Gihun
    Oh, Jeongsik
    Pyo, Su-Han
    An, Byeonghyeon
    Park, Taesik
    IEEE ACCESS, 2024, 12 : 191790 - 191807
  • [37] A hybrid cost-sensitive ensemble for heart disease prediction
    Qi Zhenya
    Zhang, Zuoru
    BMC MEDICAL INFORMATICS AND DECISION MAKING, 2021, 21 (01)
  • [38] Cost-Sensitive LVQ for Bankruptcy Prediction: An Empirical Study
    Chen, Ning
    Vieira, Armando
    Duarte, Joao
    2009 2ND IEEE INTERNATIONAL CONFERENCE ON COMPUTER SCIENCE AND INFORMATION TECHNOLOGY, VOL 5, 2009, : 115 - 119
  • [39] A hybrid cost-sensitive ensemble for heart disease prediction
    Qi Zhenya
    Zuoru Zhang
    BMC Medical Informatics and Decision Making, 21
  • [40] Cost-Sensitive Churn Prediction in Fund Management Services
    Brownlow, James
    Chu, Charles
    Fu, Bin
    Xu, Guandong
    Culbert, Ben
    Meng, Qinxue
    DATABASE SYSTEMS FOR ADVANCED APPLICATIONS (DASFAA 2018), PT II, 2018, 10828 : 776 - 788