Early Software Defects Density Prediction: Training the International Software Benchmarking Cross Projects Data Using Supervised Learning

被引:8
|
作者
Tahir, Touseef [1 ]
Gencel, Cigdem [2 ]
Rasool, Ghulam [1 ]
Umer, Tariq [1 ]
Rasheed, Jawad [3 ,4 ,5 ]
Yeo, Sook Fern [6 ,7 ]
Cevik, Taner [8 ]
机构
[1] COMSATS Univ Islamabad, Dept Comp Sci, Lahore Campus, Lahore 54000, Pakistan
[2] Ankara Medipol Univ, Dept Management Informat Syst, TR-06050 Ankara, Turkiye
[3] Istanbul Sabahattin Zaim Univ, Dept Comp Engn, TR-34303 Istanbul, Turkiye
[4] Istanbul Nisantasi Univ, Dept Software Engn, TR-34398 Istanbul, Turkiye
[5] Bogazici Univ, Deep Learning & Med Image Anal Lab, TR-34342 Istanbul, Turkiye
[6] Multimedia Univ, Fac Business, Malacca 75450, Malaysia
[7] Daffodil Int Univ, Dept Business Adm, Dhaka 1207, Bangladesh
[8] Istanbul Arel Univ, Dept Comp Engn, TR-34537 Istanbul, Turkiye
关键词
Software engineering; Costs; Machine learning; Codes; Predictive models; Training; Size measurement; Fault detection; Prediction methods; Supervised learning; Project management; Cross projects dataset; defect prediction; feature selection; fault prediction; ISBSG dataset; machine learning;
D O I
10.1109/ACCESS.2023.3339994
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Recent reviews of the literature indicate the need for empirical studies on cross-project defect prediction (CPDP) that would allow aggregation of the evidence and improve predictive performance. Most empirical studies predict defects at granularity levels of method, class, file, and module/package during the coding phase, and thereby avoid external failure costs. The main goal of this study is to perform an empirical study on early defect prediction at the beginning of a project at the product level of granularity for using it as input in planning quality activities of the project. Hence, both internal and external failure costs could be avoided as much as possible through proper planning of quality. We first made a systematic mapping study (SMS) on secondary studies (literature reviews) on defect prediction to identify the most used datasets, the project attributes and metrics utilized as estimators, and the supervised learning methods employed for training the data. Then, we made an empirical study on defect density prediction using cross-project data. We collected 760 project data from the International Software Benchmarking (ISBSG) dataset version 11, which reported both defects and functional size attributes. We trained the prediction models using: i) the complete set of project attributes, ii) the individual attributes, and iii) multiple subsets of attributes. We employed classification and regression approaches of machine learning. The machine learning models are trained using original values of the dataset, and z-score and logged transformations of original values to explore the effects of data normalization on prediction. Most machine learning models trained on the z-score transformation of the dataset performed best for classifying defects. The Multilayer-Perceptron (Neural Network) model trained on the z-score transformation of complete dataset predicted defects with the highest F1-score of 0.89 using binary classification. The logged transformation and feature selection methods improved the results for multivariable regression. The multivariable regression predicted defects with the highest Root Mean Squared Error (RMSE) and R-2 (r-squared) values of 0.4 and 0.9, respectively, with a subset of 11 features using logged transformation. The results of classification and regression approaches indicate that defects can be predicted with reasonable accuracy at the software product level using cross-project data.
引用
收藏
页码:141965 / 141986
页数:22
相关论文
共 37 条
  • [21] Comparative Study on Defect Prediction Algorithms of Supervised Learning Software Based on Imbalanced Classification Data Sets
    Ge, Jianxin
    Liu, Jiaomin
    Liu, Wenyuan
    [J]. 2018 19TH IEEE/ACIS INTERNATIONAL CONFERENCE ON SOFTWARE ENGINEERING, ARTIFICIAL INTELLIGENCE, NETWORKING AND PARALLEL/DISTRIBUTED COMPUTING (SNPD), 2018, : 399 - 406
  • [22] Early-stage product quality prediction by using software process data
    Fukushima, Toshihiko
    Fukuta, Atsushi
    Yamada, Shigeru
    [J]. Eleventh ISSAT International Conference Reliability and Quality in Design, Proceedings, 2005, : 261 - 265
  • [23] Using active learning selection approach for cross-project software defect prediction
    Mi, Wenbo
    Li, Yong
    Wen, Ming
    Chen, Youren
    [J]. CONNECTION SCIENCE, 2022, 34 (01) : 1482 - 1499
  • [24] DP-CCL: A Supervised Contrastive Learning Approach Using CodeBERT Model in Software Defect Prediction
    Sahar, Sadia
    Younas, Muhammad
    Khan, Muhammad Murad
    Sarwar, Muhammad Umer
    [J]. IEEE ACCESS, 2024, 12 : 22582 - 22594
  • [25] Weakly Supervised Occupancy Prediction Using Training Data Collected via Interactive Learning
    Bouhamed, Omar
    Amayri, Manar
    Bouguila, Nizar
    [J]. SENSORS, 2022, 22 (09)
  • [26] Machine learning based improved cross-project software defect prediction using new structural features in object oriented software
    Singh, Manpreet
    Chhabra, Jitender Kumar
    [J]. APPLIED SOFT COMPUTING, 2024, 165
  • [27] Link Congestion Prediction using Machine Learning for Software-Defined-Network Data Plane
    Wu, Junying
    Peng, Yunfeng
    Song, Meng
    Cui, Manman
    Zhang, Liang
    [J]. PROCEEDING OF THE 2019 INTERNATIONAL CONFERENCE ON COMPUTER, INFORMATION AND TELECOMMUNICATION SYSTEMS (IEEE CITS 2019), 2019, : 81 - 85
  • [28] Cross-projects software defect prediction using spotted hyena optimizer algorithm (vol 2, 538, 2020)
    Elsabagh, M. A.
    Farhan, M. S.
    Gafar, M. G.
    [J]. SN APPLIED SCIENCES, 2022, 4 (02)
  • [29] Software fault prediction using data mining, machine learning and deep learning techniques: A systematic literature review
    Batool, Iqra
    Khan, Tamim Ahmed
    [J]. COMPUTERS & ELECTRICAL ENGINEERING, 2022, 100
  • [30] Early prediction of heart disease with data analysis using supervised learning with stochastic gradient boosting
    Jawalkar A.P.
    Swetcha P.
    Manasvi N.
    Sreekala P.
    Aishwarya S.
    Kanaka Durga Bhavani P.
    Anjani P.
    [J]. Journal of Engineering and Applied Science, 2023, 70 (01):