Mutation-based data augmentation for software defect prediction

被引：1

作者：

Mao, Rui ^{[1
]}

Zhang, Li ^{[1
]}

Zhang, Xiaofang ^{[1
,2
]}

机构：

[1] Soochow Univ, Sch Comp Sci & Technol, Suzhou, Peoples R China

[2] Soochow Univ, Sch Comp Sci & Technol, Suzhou 215006, Peoples R China

来源：

JOURNAL OF SOFTWARE-EVOLUTION AND PROCESS | 2023年 / 36卷 / 06期

基金：

中国国家自然科学基金;

关键词：

imbalance learning; mutation testing; oversampling; software defect prediction; MACHINE;

D O I：

10.1002/smr.2634

中图分类号：

TP31 [计算机软件];

学科分类号：

081202 ; 0835 ;

摘要：

Software defect prediction (SDP) aims to distinguish between defective and nondefective instances, but the imbalance between these two classes often leads to reduced prediction performance. Conventional SDP approaches use oversampling techniques, such as synthetic oversampling, to tackle the problem of imbalanced data. However, these methods merely synthesize new instances based on traditional code features without considering actual defects at the code level. To address the issue of data imbalance while preserving semantic features of code samples, a mutation-based data augmentation approach in SDP is proposed. The method utilizes the mutation operator to generate mutants that mutate nondefective instances and create new defective instances. Six projects from the PROMISE dataset are used to evaluate the approach, employing four traditional and two deep classifiers. The experimental results demonstrate the effectiveness of this method in improving defect prediction performance for both traditional and deep classifiers compared with other data augmentation methods. A novel mutation-based data augmentation method is proposed, in which data are increased at the code level while preserving its semantic features. The method utilizes the mutation operator for generating mutants to mutate against nondefective instances and then generates new defective instances. Experimental results demonstrate that this approach achieves the best average overall performance.

引用

页数：17

共 50 条

[1] A FORTRAN LANGUAGE SYSTEM FOR MUTATION-BASED SOFTWARE TESTING
KING, KN
OFFUTT, AJ
[J]. SOFTWARE-PRACTICE & EXPERIENCE, 1991, 21 (07): : 685 - 718
[2] Software Defect Prediction Based on Stability Test Data
Okumoto, Kazu
[J]. 2011 INTERNATIONAL CONFERENCE ON QUALITY, RELIABILITY, RISK, MAINTENANCE, AND SAFETY ENGINEERING (ICQR2MSE), 2011, : 385 - 387
[3] Research on Software Defect Prediction Based on Data Mining
Chen, Yuan
Shen, Xiang-heng
Du, Peng
Ge, Bing
[J]. 2010 2ND INTERNATIONAL CONFERENCE ON COMPUTER AND AUTOMATION ENGINEERING (ICCAE 2010), VOL 1, 2010, : 563 - 567
[4] Mutation-Based Generation of Software Product Line Test Configurations
Henard, Christopher
Papadakis, Mike
Le Traon, Yves
[J]. SEARCH-BASED SOFTWARE ENGINEERING, 2014, 8636 : 92 - 106
[5] Mutation-based simulation test data generation for testing complex real-time software
Bai, Xiaoying
Lee, Shufang
Chen, Yinong
[J]. 40TH ANNUAL SIMULATION SYMPOSIUM, PROCEEDINGS, 2007, : 73 - 80
[6] Mutation-based Test-Case Prioritization in Software Evolution
Lou, Yiling
Hao, Dan
Zhang, Lu
[J]. 2015 IEEE 26TH INTERNATIONAL SYMPOSIUM ON SOFTWARE RELIABILITY ENGINEERING (ISSRE), 2015, : 46 - 57
[7] Software Defect Prediction with Skewed Data
Seliya, Naeem
Khoshgoftaar, Taghi M.
[J]. 16TH ISSAT INTERNATIONAL CONFERENCE ON RELIABILITY AND QUALITY IN DESIGN, 2010, : 403 - +
[8] Evolving Software: Combining Online Learning with Mutation-Based Stochastic Search
Renzullo J.
Weimer W.
Forrest S.
[J]. ACM Transactions on Evolutionary Learning and Optimization, 2023, 3 (04):
[9] A gene mutation-based risk model for prognostic prediction in liver metastases
Bingran Yu
Ning Zhang
Yun Feng
Weiqi Xu
Ti Zhang
Lu Wang
[J]. BMC Genomics, 24
[10] Defect Prediction in Software Using Predictive Models Based on Historical Data
Czyczyn-Egird, Daniel
Slowik, Adam
[J]. DISTRIBUTED COMPUTING AND ARTIFICIAL INTELLIGENCE, 2019, 801 : 94 - 101

← 1 2 3 4 5 →