A set of measures designed to identify overlapped instances in software defect prediction

被引:0
|
作者
Shivani Gupta
Atul Gupta
机构
[1] Indian Institute of Information Technology,
[2] Design and Manufacturing Jabalpur,undefined
来源
Computing | 2017年 / 99卷
关键词
Data complexity measures; Class overlapping; Data mining; Machine learning; Software defect prediction;
D O I
暂无
中图分类号
学科分类号
摘要
The performance of the learning models will intensely rely on the characteristics of the training data. The previous outcomes recommend that the overlapping between classes and the presence of noise have the most grounded impact on the performance of learning algorithm, and software defect datasets are no exceptions. The class overlap problem is concerned with the performance of machine learning classifiers critical problem is class overlap in which data samples appear as valid examples of more than one class which may be responsible for the presence of noise in datasets. We aim to investigate how the presence of overlapped instances in a dataset influences the classifier’s performance, and how to deal with class overlapping problem. To have a close estimate of class overlapping, we have proposed four different measures namely, nearest enemy ratio, subconcept ratio, likelihood ratio and soft margin ratio. We performed our investigations using 327 binary defect classification datasets obtained from 54 software projects, where we first identified overlapped datasets using three data complexity measures proposed in the literature. We also include treatment effort into the prediction process. Subsequently, we used our proposed measures to find overlapped instances in the identified overlapped datasets. Our results indicated that by training a classifier on a training data free from overlapped instances led to an improved classifier performance on the test data containing overlapped instances. The classifiers perform significantly better when the evaluation measure takes the effort into account.
引用
收藏
页码:889 / 914
页数:25
相关论文
共 50 条
  • [1] A set of measures designed to identify overlapped instances in software defect prediction
    Gupta, Shivani
    Gupta, Atul
    COMPUTING, 2017, 99 (09) : 889 - 914
  • [2] A Rough Set Model for Software Defect Prediction
    Yang Weimin
    Li Longshu
    INTERNATIONAL CONFERENCE ON INTELLIGENT COMPUTATION TECHNOLOGY AND AUTOMATION, VOL 1, PROCEEDINGS, 2008, : 747 - +
  • [3] Learning to Identify Unexpected Instances in the Test Set
    Li, Xiao-Li
    Liu, Bing
    Ng, See-Kiong
    20TH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, 2007, : 2802 - 2807
  • [4] An empirical study on software defect prediction with a simplified metric set
    He, Peng
    Li, Bing
    Liu, Xiao
    Chen, Jun
    Ma, Yutao
    INFORMATION AND SOFTWARE TECHNOLOGY, 2015, 59 : 170 - 190
  • [5] Hybrid deep architecture for software defect prediction with improved feature set
    Shyamala, C.
    Mohana, S.
    Ambika, M.
    Gomathi, K.
    MULTIMEDIA TOOLS AND APPLICATIONS, 2024, 83 (31) : 76551 - 76586
  • [6] Similarity-Based Training Set Recommendation for Software Defect Prediction
    Wang, Chao
    Yu, Qiao
    Han, Hui
    Computer Engineering and Applications, 2023, 59 (09) : 86 - 94
  • [7] Neg/pos-Normalized Accuracy Measures for Software Defect Prediction
    Gan, Maohua
    Yucel, Zeynep
    Monden, Akito
    IEEE ACCESS, 2022, 10 : 134580 - 134591
  • [8] Directly Identify Unexpected Instances in the Test Set by Entropy Maximization
    Sha, Chaofeng
    Xu, Zhen
    Wang, Xiaoling
    Zhou, Aoying
    ADVANCES IN DATA AND WEB MANAGEMENT, PROCEEDINGS, 2009, 5446 : 659 - +
  • [9] Research on software defect prediction
    Laboratory for Internet Software Technologies, Institute of Software, Chinese Acad. of Sci., Beijing 100190, China
    不详
    不详
    Ruan Jian Xue Bao, 2008, 7 (1565-1580): : 1565 - 1580
  • [10] Defect prediction for embedded software
    Oral, Atac Deniz
    Bener, Ayse Basar
    2007 22ND INTERNATIONAL SYMPOSIUM ON COMPUTER AND INFORMATION SCIENCES, 2007, : 346 - 351