A set of measures designed to identify overlapped instances in software defect prediction

被引:0
|
作者
Shivani Gupta
Atul Gupta
机构
[1] Indian Institute of Information Technology,
[2] Design and Manufacturing Jabalpur,undefined
来源
Computing | 2017年 / 99卷
关键词
Data complexity measures; Class overlapping; Data mining; Machine learning; Software defect prediction;
D O I
暂无
中图分类号
学科分类号
摘要
The performance of the learning models will intensely rely on the characteristics of the training data. The previous outcomes recommend that the overlapping between classes and the presence of noise have the most grounded impact on the performance of learning algorithm, and software defect datasets are no exceptions. The class overlap problem is concerned with the performance of machine learning classifiers critical problem is class overlap in which data samples appear as valid examples of more than one class which may be responsible for the presence of noise in datasets. We aim to investigate how the presence of overlapped instances in a dataset influences the classifier’s performance, and how to deal with class overlapping problem. To have a close estimate of class overlapping, we have proposed four different measures namely, nearest enemy ratio, subconcept ratio, likelihood ratio and soft margin ratio. We performed our investigations using 327 binary defect classification datasets obtained from 54 software projects, where we first identified overlapped datasets using three data complexity measures proposed in the literature. We also include treatment effort into the prediction process. Subsequently, we used our proposed measures to find overlapped instances in the identified overlapped datasets. Our results indicated that by training a classifier on a training data free from overlapped instances led to an improved classifier performance on the test data containing overlapped instances. The classifiers perform significantly better when the evaluation measure takes the effort into account.
引用
收藏
页码:889 / 914
页数:25
相关论文
共 50 条
  • [21] Classifier Evaluation for Software Defect Prediction
    Kou, Gang
    Peng, Yi
    Shi, Yong
    Wu, Wenshuai
    STUDIES IN INFORMATICS AND CONTROL, 2012, 21 (02): : 117 - 126
  • [22] Software Defect Prediction with Skewed Data
    Seliya, Naeem
    Khoshgoftaar, Taghi M.
    16TH ISSAT INTERNATIONAL CONFERENCE ON RELIABILITY AND QUALITY IN DESIGN, 2010, : 403 - +
  • [23] Software Defect Prediction via Transformer
    Zhang, Qihang
    Wu, Bin
    PROCEEDINGS OF 2020 IEEE 4TH INFORMATION TECHNOLOGY, NETWORKING, ELECTRONIC AND AUTOMATION CONTROL CONFERENCE (ITNEC 2020), 2020, : 874 - 879
  • [24] A critique of software defect prediction models
    Fenton, NE
    Neil, M
    IEEE TRANSACTIONS ON SOFTWARE ENGINEERING, 1999, 25 (05) : 675 - 689
  • [25] A Systematic Review on Software Defect Prediction
    Singh, Pradeep Kumar
    Agarwal, Dishti
    Gupta, Aakriti
    2015 2ND INTERNATIONAL CONFERENCE ON COMPUTING FOR SUSTAINABLE GLOBAL DEVELOPMENT (INDIACOM), 2015, : 1793 - 1797
  • [26] On the Costs and Profit of Software Defect Prediction
    Herbold, Steffen
    IEEE TRANSACTIONS ON SOFTWARE ENGINEERING, 2021, 47 (11) : 2617 - 2631
  • [27] Progress on approaches to software defect prediction
    Li, Zhiqiang
    Jing, Xiao-Yuan
    Zhu, Xiaoke
    IET SOFTWARE, 2018, 12 (03) : 161 - 175
  • [28] Software defect prediction via LSTM
    Deng, Jiehan
    Lu, Lu
    Qiu, Shaojian
    IET SOFTWARE, 2020, 14 (04) : 443 - 450
  • [29] Survey of software defect prediction features
    Shaoming Qiu
    Bicong E
    Jingjie He
    Liangyu Liu
    Neural Computing and Applications, 2025, 37 (4) : 2113 - 2144
  • [30] Progress in Automated Software Defect Prediction
    Ostrand, Thomas J.
    Weyuker, Elaine J.
    HARDWARE AND SOFTWARE: VERIFICATION AND TESTING, PROCEEDINGS, 2009, 5394 : 200 - 204