Software Defect Prediction with Skewed Data

被引：0

作者：

Seliya, Naeem ^{[1
]}

Khoshgoftaar, Taghi M. ^{[2
]}

机构：

[1] Univ Michigan, 4901 Evergreen Rd, Dearborn, MI 48128 USA

[2] Florida Atlantic Univ, Comp & Elect Engn & Comp Sci, Boca Raton, FL 33431 USA

来源：

16TH ISSAT INTERNATIONAL CONFERENCE ON RELIABILITY AND QUALITY IN DESIGN | 2010年

关键词：

defect prediction; software metrics; skewed data; machine learning; data sampling; boosting; CLASSIFICATION;

D O I：

暂无

中图分类号：

TP31 [计算机软件];

学科分类号：

081202 ; 0835 ;

摘要：

Software defect prediction is often employed as a cost-effective tool to focus quality-improvement resources on poor-quality program modules. Many software measurement and defect data sets are very skewed, where the proportion of not-fault-prone (majority class) modules is substantially larger than that of fault-prone (minority class) modules. Data sampling and Boosting (Boost) are useful techniques for alleviating this problem. While various Data Sampling techniques are available, our prior studies have shown Random Undersampling (RUS) to be effective. RUS works by randomly removing instances from the majority class of the training data. This paper investigates combining Data Sampling and Boosting (RUSBoost) for software defect prediction with skewed software measurement and defect data sets. We consider two variations of RUSBoost depending on the percentage of fault-prone modules desired in the post-sampling training data set. Labeled as RUSBoost_A and RUSBoost_B, they respectively represent whether 35% or 50% of the post-sampling training data set should be fault-prone modules. A case study of 15 data sets from multiple real-world projects is used to demonstrate that RUSBoost performs significantly better than a model built without any Data Sampling or Boosting technique. Moreover, RUSBoost_B significantly outperforms RUSBoost_A.

引用

页码：403 / +

页数：2

共 50 条

[1] Aggregating Data Sampling with Feature Subset Selection to Address Skewed Software Defect Data
Gao, Kehan
Khoshgoftaar, Taghi M.
Napolitano, Amri
[J]. INTERNATIONAL JOURNAL OF SOFTWARE ENGINEERING AND KNOWLEDGE ENGINEERING, 2015, 25 (9-10) : 1531 - 1550
[2] Deep Learning Experiments with Skewed Data for Defect Prediction in Plastic Injection Molding
Kim, Seongwoo
Kim, Seyoung
Ryu, Kwang Ryel
[J]. 2018 IEEE/ACS 15TH INTERNATIONAL CONFERENCE ON COMPUTER SYSTEMS AND APPLICATIONS (AICCSA), 2018,
[3] A novel under sampling strategy for efficient software defect analysis of skewed distributed data
K. Nitalaksheswara Rao
Ch. Satyananda Reddy
[J]. Evolving Systems, 2020, 11 : 119 - 131
[4] A novel under sampling strategy for efficient software defect analysis of skewed distributed data
Rao, K. Nitalaksheswara
Reddy, Ch. Satyananda
[J]. EVOLVING SYSTEMS, 2020, 11 (01) : 119 - 131
[5] Early Software Defect Prediction: Right-Shifting Software Effort Data into a Defect Curve
Okumoto, Kazuhira
[J]. 2022 IEEE INTERNATIONAL SYMPOSIUM ON SOFTWARE RELIABILITY ENGINEERING WORKSHOPS (ISSREW 2022), 2022, : 43 - 48
[6] Software Defect Prediction Based on Stability Test Data
Okumoto, Kazu
[J]. 2011 INTERNATIONAL CONFERENCE ON QUALITY, RELIABILITY, RISK, MAINTENANCE, AND SAFETY ENGINEERING (ICQR2MSE), 2011, : 385 - 387
[7] Feature Selection with Imbalanced Data for Software Defect Prediction
Khoshgoftaar, Taghi M.
Gao, Kehan
[J]. EIGHTH INTERNATIONAL CONFERENCE ON MACHINE LEARNING AND APPLICATIONS, PROCEEDINGS, 2009, : 235 - +
[8] A Systematic Data Collection Procedure for Software Defect Prediction
Mausa, Goran
Grbac, Tihana Galinac
Basic, Bojana Dalbelo
[J]. COMPUTER SCIENCE AND INFORMATION SYSTEMS, 2016, 13 (01) : 173 - 197
[9] Imbalanced Data Processing Model for Software Defect Prediction
Lijuan Zhou
Ran Li
Shudong Zhang
Hua Wang
[J]. Wireless Personal Communications, 2018, 102 : 937 - 950
[10] Imbalanced Data Processing Model for Software Defect Prediction
Zhou, Lijuan
Li, Ran
Zhang, Shudong
Wang, Hua
[J]. WIRELESS PERSONAL COMMUNICATIONS, 2018, 102 (02) : 937 - 950

← 1 2 3 4 5 →