Genetic algorithm-based oversampling approach to prune the class imbalance issue in software defect prediction

被引:10
|
作者
Arun, C. [1 ]
Lakshmi, C. [1 ]
机构
[1] SRM Inst Sci & Technol, Sch Comp, Chennai 603203, Tamil Nadu, India
关键词
Class imbalance; Software fault prediction; Synthetic samples; Generating samples of minority class; Oversampling techniques; Genetic algorithm; False alarm rate; Evolutionary algorithm; METRICS; SMOTE;
D O I
10.1007/s00500-021-06112-6
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Class imbalance is the potential problem that has been existent in machine learning, which hinders the performance of the classification algorithm when applied in real-world applications such as electricity pilferage, fraudulent transactions, anomaly detection, and prediction of rare diseases. Class imbalance refers to the problem where the distribution of the sample is skewed or biased toward one particular class. Due to its intrinsic nature the software fault prediction dataset falls into the same category where the software modules contain fewer defective modules compared to the non-defective modules. The majority of the oversampling techniques that has been proposed is to address the issue by generating synthetic samples of minority class to balance the dataset. But the synthetic samples generated are near duplicates that also results in over-generalization issue. We thus propose a novel oversampling approach to introduce synthetic samples using genetic algorithm (GA). GA is a form of evolutionary algorithm that employs biologically inspired techniques such as inheritance, mutation, selection, and crossover. The proposed algorithm generates synthetic sample of minority class based on the distribution measure and ensures that the samples are diverse within the class and are efficient. The proposed oversampling algorithm has been compared with SMOTE, BSMOTE, ADASYN, random oversampling, MAHAKIL, and no sampling approach with 20 defect prediction datasets from the promise repository and five prediction models. The results indicate that the genetic algorithm oversampling approach improves the fault prediction performance and reduced false alarm rate.
引用
收藏
页码:12915 / 12931
页数:17
相关论文
共 50 条
  • [41] Credibility Based Imbalance Boosting Method for Software Defect Proneness Prediction
    Tong, Haonan
    Wang, Shihai
    Li, Guangling
    [J]. APPLIED SCIENCES-BASEL, 2020, 10 (22): : 1 - 29
  • [42] Software Defect Prediction based on Conditional Random Field in Imbalance Distribution
    Yang, Chunhui
    Gao, Yan
    Xiang, Jianwen
    Liang, Lixin
    [J]. 2015 2ND INTERNATIONAL SYMPOSIUM ON DEPENDABLE COMPUTING AND INTERNET OF THINGS (DCIT), 2015, : 67 - 71
  • [43] Genetic algorithm-based price and warranty optimization in software systems
    Arora, Rajat
    Tandon, Abhishek
    Aggarwal, Anu G.
    Mittal, Rubina
    [J]. EXPERT SYSTEMS, 2024, 41 (07)
  • [44] An empirical study toward dealing with noise and class imbalance issues in software defect prediction
    Sushant Kumar Pandey
    Anil Kumar Tripathi
    [J]. Soft Computing, 2021, 25 : 13465 - 13492
  • [45] Software Defect Prediction Based Ensemble Approach
    Harikiran J.
    Chandana B.S.
    Srinivasarao B.
    Raviteja B.
    Reddy T.S.
    [J]. Computer Systems Science and Engineering, 2023, 45 (03): : 2313 - 2331
  • [46] A Genetic Algorithm-based AutoML Approach for Large-scale Traffic Speed Prediction
    You, Junwei
    [J]. 2020 IEEE 5TH INTERNATIONAL CONFERENCE ON INTELLIGENT TRANSPORTATION ENGINEERING (IEEE ICITE 2020), 2020, : 111 - 116
  • [47] A genetic algorithm-based clustering approach for database partitioning
    Cheng, CH
    Lee, WK
    Wong, KF
    [J]. IEEE TRANSACTIONS ON SYSTEMS MAN AND CYBERNETICS PART C-APPLICATIONS AND REVIEWS, 2002, 32 (03): : 215 - 230
  • [48] Stochastic diagonalization of Hamiltonian: A genetic algorithm-based approach
    Nandy, S
    Chaudhury, P
    Bhattacharyya, SP
    [J]. INTERNATIONAL JOURNAL OF QUANTUM CHEMISTRY, 2002, 90 (01) : 188 - 194
  • [49] A Genetic Algorithm-Based Approach for Test Case Prioritization
    Habtemariam, Getachew Mekuria
    Mohapatra, Sudhir Kumar
    [J]. INFORMATION AND COMMUNICATION TECHNOLOGY FOR DEVELOPMENT FOR AFRICA (ICT4DA 2019), 2019, 1026 : 24 - 37
  • [50] A Genetic algorithm-Based Approach for Classification Rule Discovery
    Shi, Xian-Jun
    Lei, Hong
    [J]. PROCEEDINGS OF THE INTERNATIONAL CONFERENCE ON INFORMATION MANAGEMENT, INNOVATION MANAGEMENT AND INDUSTRIAL ENGINEERING, VOL 1, 2008, : 175 - 178