Misclassification Cost-Sensitive Software Defect Prediction

被引:4
|
作者
Xu, Ling [1 ,2 ]
Wang, Bei [1 ]
Liu, Ling [2 ]
Zhou, Mo [1 ]
Liao, Shengping [1 ]
Yan, Meng [3 ]
机构
[1] Chongqing Univ, Sch Software Engn, Chongqing, Peoples R China
[2] Georgia Inst Technol, Coll Comp, Atlanta, GA 30332 USA
[3] Zhejiang Univ, Coll Comp Sci & Technol, Hangzhou, Zhejiang, Peoples R China
基金
中国国家自然科学基金; 中国博士后科学基金;
关键词
software defect prediction; cost-sensitive; semi-supervised; unsupervised sampling;
D O I
10.1109/IRI.2018.00047
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Software defect prediction helps developers focus on defective modules for efficient software quality assurance. A common goal shared by existing software defect prediction methods is to attain low classification error rates. These proposals suffer from two practical problems: (i) Most of the prediction methods rely on a large number of labeled training data. However, collecting labeled data is a difficult and expensive task. It is hard to obtain classification labels over new software projects or existing projects without historical defect data. (ii) Software defect datasets are highly imbalanced. In many real-world applications, the misclassification cost of defective modules is generally several times higher than that of non-defective ones. In this paper, we present a misclassification Cost-sensitive approach to Software Defect Prediction (CSDP). The CSDP approach is novel in two aspects: First, CSDP addresses the problem of unlabeled software detect datasets by combining an unsupervised sampling method with a domain specific misclassification cost model. This preprocessing step selectively samples a small percentage of modules through estimating their classification labels. Second, CSDP builds a cost-sensitive support vector machine model to predict defect-proneness of the rest of modules with both overall classification error rate and domain specific misclassification cost as quality metrics. CSDP is evaluated on four NASA projects. Experimental results highlight three interesting observations: (1) CSDP achieves higher Normalized Expected Cost of Misclassification (NECM) compared with state-of-art supervised learning models under imbalanced training data with limited labeling. (2) CSDP outperforms state-of-art semi-supervised learning methods, which disregards classification costs, especially in recall rate. (3) CSDP enhanced through unsupervised sampling as a preprocessing step prior to training and prediction outperforms the baseline CSDP without the sampling process.
引用
收藏
页码:256 / 263
页数:8
相关论文
共 50 条
  • [41] Two-stage cost-sensitive local models for heterogeneous cross-project defect prediction
    Huang, Yan
    Xu, Xian
    2022 IEEE 46TH ANNUAL COMPUTERS, SOFTWARE, AND APPLICATIONS CONFERENCE (COMPSAC 2022), 2022, : 819 - 828
  • [42] Cost-Sensitive Boosting
    Masnadi-Shirazi, Hamed
    Vasconcelos, Nuno
    IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2011, 33 (02) : 294 - 309
  • [43] Cost-Sensitive Learning
    Zhou, Zlii-Hua
    MODELING DECISIONS FOR ARTIFICIAL INTELLIGENCE, MDAI 2011, 2011, 6820 : 17 - 18
  • [44] Cost-Sensitive Ensemble Learning for Venture Capital Exit Prediction
    Fang, Heng
    Ma, Ding
    2024 16TH INTERNATIONAL CONFERENCE ON MACHINE LEARNING AND COMPUTING, ICMLC 2024, 2024, : 161 - 167
  • [45] Cost-sensitive Prediction of Airline Delays Using Machine Learning
    Choi, Sun
    Kim, Young Jin
    Briceno, Simon
    Mavris, Dimitri
    2017 IEEE/AIAA 36TH DIGITAL AVIONICS SYSTEMS CONFERENCE (DASC), 2017,
  • [46] Seizure Prediction Using Cost-Sensitive Support Vector Machine
    Netoff, Theoden
    Park, Yun
    Parhi, Keshab
    2009 ANNUAL INTERNATIONAL CONFERENCE OF THE IEEE ENGINEERING IN MEDICINE AND BIOLOGY SOCIETY, VOLS 1-20, 2009, : 3322 - 3325
  • [47] Cost-Sensitive Learning Vector Quantization for Financial Distress Prediction
    Chen, Ning
    Vieira, Armando S.
    Duarte, Joao
    Ribeiro, Bernardete
    Neves, Joao C.
    PROGRESS IN ARTIFICIAL INTELLIGENCE, PROCEEDINGS, 2009, 5816 : 374 - +
  • [48] Enhancing Disease Prediction on Imbalanced Metagenomic Dataset by Cost-Sensitive
    Hai Thanh Nguyen
    Toan Bao Tran
    Quan Minh Bui
    Huong Hoang Luong
    Trung Phuoc Le
    Nghi Cong Tran
    INTERNATIONAL JOURNAL OF ADVANCED COMPUTER SCIENCE AND APPLICATIONS, 2020, 11 (07) : 651 - 657
  • [49] Enhancing software code smell detection with modified cost-sensitive SVM
    Thakur, Praveen Singh
    Jadeja, Mahipal
    Chouhan, Satyendra Singh
    INTERNATIONAL JOURNAL OF SYSTEM ASSURANCE ENGINEERING AND MANAGEMENT, 2024, 15 (07) : 3210 - 3224
  • [50] Cost-Sensitive Laplacian Logistic Regression for Ship Detention Prediction
    Tian, Xuecheng
    Wang, Shuaian
    MATHEMATICS, 2023, 11 (01)