Genetic algorithm-based oversampling approach to prune the class imbalance issue in software defect prediction

被引:10
|
作者
Arun, C. [1 ]
Lakshmi, C. [1 ]
机构
[1] SRM Inst Sci & Technol, Sch Comp, Chennai 603203, Tamil Nadu, India
关键词
Class imbalance; Software fault prediction; Synthetic samples; Generating samples of minority class; Oversampling techniques; Genetic algorithm; False alarm rate; Evolutionary algorithm; METRICS; SMOTE;
D O I
10.1007/s00500-021-06112-6
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Class imbalance is the potential problem that has been existent in machine learning, which hinders the performance of the classification algorithm when applied in real-world applications such as electricity pilferage, fraudulent transactions, anomaly detection, and prediction of rare diseases. Class imbalance refers to the problem where the distribution of the sample is skewed or biased toward one particular class. Due to its intrinsic nature the software fault prediction dataset falls into the same category where the software modules contain fewer defective modules compared to the non-defective modules. The majority of the oversampling techniques that has been proposed is to address the issue by generating synthetic samples of minority class to balance the dataset. But the synthetic samples generated are near duplicates that also results in over-generalization issue. We thus propose a novel oversampling approach to introduce synthetic samples using genetic algorithm (GA). GA is a form of evolutionary algorithm that employs biologically inspired techniques such as inheritance, mutation, selection, and crossover. The proposed algorithm generates synthetic sample of minority class based on the distribution measure and ensures that the samples are diverse within the class and are efficient. The proposed oversampling algorithm has been compared with SMOTE, BSMOTE, ADASYN, random oversampling, MAHAKIL, and no sampling approach with 20 defect prediction datasets from the promise repository and five prediction models. The results indicate that the genetic algorithm oversampling approach improves the fault prediction performance and reduced false alarm rate.
引用
收藏
页码:12915 / 12931
页数:17
相关论文
共 50 条
  • [1] Genetic algorithm-based oversampling approach to prune the class imbalance issue in software defect prediction
    C. Arun
    C. Lakshmi
    [J]. Soft Computing, 2022, 26 : 12915 - 12931
  • [2] MAHAKIL: Diversity Based Oversampling Approach to Alleviate the Class Imbalance Issue in Software Defect Prediction
    Benni, Kwabena Ebo
    Keung, Jacky
    Phannachitta, Passakorn
    Monden, Akito
    Mensah, Solomon
    [J]. IEEE TRANSACTIONS ON SOFTWARE ENGINEERING, 2018, 44 (06) : 534 - 550
  • [3] MAHAKIL: Diversity based Oversampling Approach to Alleviate the Class Imbalance Issue in Software Defect Prediction Extended Abstract
    Bennin, Kwabena E.
    Keung, Jacky
    Phannachitta, Passakorn
    Monden, Akito
    Mensah, Solomon
    [J]. PROCEEDINGS 2018 IEEE/ACM 40TH INTERNATIONAL CONFERENCE ON SOFTWARE ENGINEERING (ICSE), 2018, : 699 - 699
  • [4] An Ensemble Oversampling Model for Class Imbalance Problem in Software Defect Prediction
    Huda, Shamsul
    Liu, Kevin
    Abdelrazek, Mohamed
    Ibrahim, Amani
    Alyahya, Sultan
    Al-Dossari, Hmood
    Ahmad, Shafiq
    [J]. IEEE ACCESS, 2018, 6 : 24184 - 24195
  • [5] Support Vector based Oversampling Technique for Handling Class Imbalance in Software Defect Prediction
    Malhotra, Ruchika
    Agrawal, Vaibhav
    Pal, Vedansh
    Agarwal, Tushar
    [J]. 2021 11TH INTERNATIONAL CONFERENCE ON CLOUD COMPUTING, DATA SCIENCE & ENGINEERING (CONFLUENCE 2021), 2021, : 1078 - 1083
  • [6] Adaptive Centre-Weighted Oversampling for Class Imbalance in Software Defect Prediction
    Zhao, Qi
    Yan, Xuefeng
    Zhou, Yong
    [J]. 2018 IEEE INT CONF ON PARALLEL & DISTRIBUTED PROCESSING WITH APPLICATIONS, UBIQUITOUS COMPUTING & COMMUNICATIONS, BIG DATA & CLOUD COMPUTING, SOCIAL COMPUTING & NETWORKING, SUSTAINABLE COMPUTING & COMMUNICATIONS, 2018, : 223 - 230
  • [7] COSTE: Complexity-based OverSampling TEchnique to alleviate the class imbalance problem in software defect prediction
    Feng, Shuo
    Keung, Jacky
    Yu, Xiao
    Xiao, Yan
    Bennin, Kwabena Ebo
    Kabir, Md Alamgir
    Zhang, Miao
    [J]. INFORMATION AND SOFTWARE TECHNOLOGY, 2021, 129
  • [8] Class Imbalance Reduction (CIR): A Novel Approach to Software Defect Prediction in the Presence of Class Imbalance
    Bejjanki, Kiran Kumar
    Gyani, Jayadev
    Gugulothu, Narsimha
    [J]. SYMMETRY-BASEL, 2020, 12 (03):
  • [9] GAMC: An Oversampling Method Based on Genetic Algorithm and Monte Carlo Method to Solve the Class Imbalance Issue in Industry
    Fan, Xuekang
    Yu, Hong
    [J]. 2022 INTERNATIONAL CONFERENCE ON INDUSTRIAL IOT, BIG DATA AND SUPPLY CHAIN, IIOTBDSC, 2022, : 127 - 132
  • [10] A Hybrid Approach to Coping with High Dimensionality and Class Imbalance for Software Defect Prediction
    Gao, Kehan
    Khoshgoftaar, Taghi
    Napolitano, Amri
    [J]. 2012 11TH INTERNATIONAL CONFERENCE ON MACHINE LEARNING AND APPLICATIONS (ICMLA 2012), VOL 2, 2012, : 281 - 288