Mutation-based data augmentation for software defect prediction

被引:1
|
作者
Mao, Rui [1 ]
Zhang, Li [1 ]
Zhang, Xiaofang [1 ,2 ]
机构
[1] Soochow Univ, Sch Comp Sci & Technol, Suzhou, Peoples R China
[2] Soochow Univ, Sch Comp Sci & Technol, Suzhou 215006, Peoples R China
基金
中国国家自然科学基金;
关键词
imbalance learning; mutation testing; oversampling; software defect prediction; MACHINE;
D O I
10.1002/smr.2634
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
Software defect prediction (SDP) aims to distinguish between defective and nondefective instances, but the imbalance between these two classes often leads to reduced prediction performance. Conventional SDP approaches use oversampling techniques, such as synthetic oversampling, to tackle the problem of imbalanced data. However, these methods merely synthesize new instances based on traditional code features without considering actual defects at the code level. To address the issue of data imbalance while preserving semantic features of code samples, a mutation-based data augmentation approach in SDP is proposed. The method utilizes the mutation operator to generate mutants that mutate nondefective instances and create new defective instances. Six projects from the PROMISE dataset are used to evaluate the approach, employing four traditional and two deep classifiers. The experimental results demonstrate the effectiveness of this method in improving defect prediction performance for both traditional and deep classifiers compared with other data augmentation methods. A novel mutation-based data augmentation method is proposed, in which data are increased at the code level while preserving its semantic features. The method utilizes the mutation operator for generating mutants to mutate against nondefective instances and then generates new defective instances. Experimental results demonstrate that this approach achieves the best average overall performance.
引用
收藏
页数:17
相关论文
共 50 条
  • [1] A FORTRAN LANGUAGE SYSTEM FOR MUTATION-BASED SOFTWARE TESTING
    KING, KN
    OFFUTT, AJ
    [J]. SOFTWARE-PRACTICE & EXPERIENCE, 1991, 21 (07): : 685 - 718
  • [2] Software Defect Prediction Based on Stability Test Data
    Okumoto, Kazu
    [J]. 2011 INTERNATIONAL CONFERENCE ON QUALITY, RELIABILITY, RISK, MAINTENANCE, AND SAFETY ENGINEERING (ICQR2MSE), 2011, : 385 - 387
  • [3] Research on Software Defect Prediction Based on Data Mining
    Chen, Yuan
    Shen, Xiang-heng
    Du, Peng
    Ge, Bing
    [J]. 2010 2ND INTERNATIONAL CONFERENCE ON COMPUTER AND AUTOMATION ENGINEERING (ICCAE 2010), VOL 1, 2010, : 563 - 567
  • [4] Mutation-Based Generation of Software Product Line Test Configurations
    Henard, Christopher
    Papadakis, Mike
    Le Traon, Yves
    [J]. SEARCH-BASED SOFTWARE ENGINEERING, 2014, 8636 : 92 - 106
  • [5] Mutation-based simulation test data generation for testing complex real-time software
    Bai, Xiaoying
    Lee, Shufang
    Chen, Yinong
    [J]. 40TH ANNUAL SIMULATION SYMPOSIUM, PROCEEDINGS, 2007, : 73 - 80
  • [6] Mutation-based Test-Case Prioritization in Software Evolution
    Lou, Yiling
    Hao, Dan
    Zhang, Lu
    [J]. 2015 IEEE 26TH INTERNATIONAL SYMPOSIUM ON SOFTWARE RELIABILITY ENGINEERING (ISSRE), 2015, : 46 - 57
  • [7] Software Defect Prediction with Skewed Data
    Seliya, Naeem
    Khoshgoftaar, Taghi M.
    [J]. 16TH ISSAT INTERNATIONAL CONFERENCE ON RELIABILITY AND QUALITY IN DESIGN, 2010, : 403 - +
  • [8] Evolving Software: Combining Online Learning with Mutation-Based Stochastic Search
    Renzullo J.
    Weimer W.
    Forrest S.
    [J]. ACM Transactions on Evolutionary Learning and Optimization, 2023, 3 (04):
  • [9] A gene mutation-based risk model for prognostic prediction in liver metastases
    Bingran Yu
    Ning Zhang
    Yun Feng
    Weiqi Xu
    Ti Zhang
    Lu Wang
    [J]. BMC Genomics, 24
  • [10] Defect Prediction in Software Using Predictive Models Based on Historical Data
    Czyczyn-Egird, Daniel
    Slowik, Adam
    [J]. DISTRIBUTED COMPUTING AND ARTIFICIAL INTELLIGENCE, 2019, 801 : 94 - 101