A set of measures designed to identify overlapped instances in software defect prediction

被引:0
|
作者
Shivani Gupta
Atul Gupta
机构
[1] Indian Institute of Information Technology,
[2] Design and Manufacturing Jabalpur,undefined
来源
Computing | 2017年 / 99卷
关键词
Data complexity measures; Class overlapping; Data mining; Machine learning; Software defect prediction;
D O I
暂无
中图分类号
学科分类号
摘要
The performance of the learning models will intensely rely on the characteristics of the training data. The previous outcomes recommend that the overlapping between classes and the presence of noise have the most grounded impact on the performance of learning algorithm, and software defect datasets are no exceptions. The class overlap problem is concerned with the performance of machine learning classifiers critical problem is class overlap in which data samples appear as valid examples of more than one class which may be responsible for the presence of noise in datasets. We aim to investigate how the presence of overlapped instances in a dataset influences the classifier’s performance, and how to deal with class overlapping problem. To have a close estimate of class overlapping, we have proposed four different measures namely, nearest enemy ratio, subconcept ratio, likelihood ratio and soft margin ratio. We performed our investigations using 327 binary defect classification datasets obtained from 54 software projects, where we first identified overlapped datasets using three data complexity measures proposed in the literature. We also include treatment effort into the prediction process. Subsequently, we used our proposed measures to find overlapped instances in the identified overlapped datasets. Our results indicated that by training a classifier on a training data free from overlapped instances led to an improved classifier performance on the test data containing overlapped instances. The classifiers perform significantly better when the evaluation measure takes the effort into account.
引用
收藏
页码:889 / 914
页数:25
相关论文
共 50 条
  • [31] Research Progress of Software Defect Prediction
    Gong L.-N.
    Jiang S.-J.
    Jiang L.
    Ruan Jian Xue Bao/Journal of Software, 2019, 30 (10): : 3090 - 3114
  • [32] Software Defect Prediction for LSI Designs
    Parizy, Matthieu
    Takayama, Koichiro
    Kanazawa, Yuji
    2014 IEEE INTERNATIONAL CONFERENCE ON SOFTWARE MAINTENANCE AND EVOLUTION (ICSME), 2014, : 565 - 568
  • [33] Unsupervised methods for Software Defect Prediction
    Ha, Duy-An
    Chen, Ting-Hsuan
    Yuan, Shyan-Ming
    SOICT 2019: PROCEEDINGS OF THE TENTH INTERNATIONAL SYMPOSIUM ON INFORMATION AND COMMUNICATION TECHNOLOGY, 2019, : 49 - 55
  • [34] A defect prediction method for software versioning
    Yomi Kastro
    Ayşe Basar Bener
    Software Quality Journal, 2008, 16 : 543 - 562
  • [35] Open Issues in Software Defect Prediction
    Arora, Ishani
    Tetarwal, Vivek
    Saha, Anju
    PROCEEDINGS OF THE INTERNATIONAL CONFERENCE ON INFORMATION AND COMMUNICATION TECHNOLOGIES, ICICT 2014, 2015, 46 : 906 - 912
  • [36] Review of three software programs designed to identify lexical bundles
    Ari, O
    LANGUAGE LEARNING & TECHNOLOGY, 2006, 10 (01): : 34 - U45
  • [37] Uncertainty measures of rough set prediction
    Düntsch, I
    Gediga, G
    ARTIFICIAL INTELLIGENCE, 1998, 106 (01) : 109 - 137
  • [38] Software Defect Prediction by Online Learning Considering Defect Overlooking
    Yamasaki, Yuta
    Fedorov, Nikolay
    Tsunoda, Masateru
    Monden, Akito
    Tahir, Amjed
    Bennin, Kwabena Ebo
    Toda, Koji
    Nakasai, Keitaro
    2023 IEEE 34TH INTERNATIONAL SYMPOSIUM ON SOFTWARE RELIABILITY ENGINEERING WORKSHOPS, ISSREW, 2023, : 43 - 44
  • [39] Software defect association mining and defect correction effort prediction
    Song, QB
    Shepperd, M
    Cartwright, M
    Mair, C
    IEEE TRANSACTIONS ON SOFTWARE ENGINEERING, 2006, 32 (02) : 69 - 82
  • [40] The Stability of Threshold Values for Software Metrics in Software Defect Prediction
    Mausa, Goran
    Grbac, Tihana Galinac
    MODEL AND DATA ENGINEERING (MEDI 2017), 2017, 10563 : 81 - 95