Dealing with missing software project data

被引：47

作者：

Cartwright, MH ^{[1
]}

Shepperd, MJ ^{[1
]}

Song, Q ^{[1
]}

机构：

[1] Bournemouth Univ, Sch Design Engn & Comp, Empir Software Engn Res Grp, Bournemouth BH1 3LT, Dorset, England

来源：

NINTH INTERNATIONAL SOFTWARE METRICS SYMPOSIUM, PROCEEDINGS | 2003年

关键词：

project effort estimation; imputation; data analysis;

D O I：

10.1109/METRIC.2003.1232464

中图分类号：

TP31 [计算机软件];

学科分类号：

081202 ; 0835 ;

摘要：

Whilst there is a general consensus that quantitative approaches are an important part of successful software project management, there has been relatively little research into many of the obstacles to data collection and analysis in the real world. One feature that characterises many of the data sets we deal with is missing or highly questionable values. Naturally this problem is not unique to software engineering, so in this paper we explore the application of two existing data imputation techniques that have been used to good effect elsewhere. In order to assess the potential value of imputation we use two industrial data sets. Both are quite problematic from an effort modelling perspective because they contain few cases, have a significant number of missing values and the projects are quite heterogeneous. The question we pose is can imputation help? To answer we examine the quality of fit of effort models derived by stepwise regression on the raw data and data sets with values imputed by various techniques is compared In both data sets we find that k-Nearest Neighbour (k-NN) and sample mean imputation (SMI) significantly improve the model fit, with k-NN giving the best results. These results are consistent with other recently published results, consequently we conclude that imputation can assist empirical software engineering.

引用

页码：154 / 165

页数：12

共 50 条

[31] Multiple imputation: a mature approach to dealing with missing data
S. Chevret
S. Seaman
M. Resche-Rigon
[J]. Intensive Care Medicine, 2015, 41 : 348 - 350
[32] Editorial: Dealing with the Missing Data Challenge in Clinical Trials
Thomas Permutt
José Pinheiro
[J]. Drug information journal : DIJ / Drug Information Association, 2009, 43 : 403 - 408
[33] A New Diffusion Kalman Algorithm Dealing with Missing Data
Xiao, Shuangyi
Mu, Nankun
Chen, Feng
[J]. ADVANCES IN NEURAL NETWORKS - ISNN 2019, PT II, 2019, 11555 : 273 - 281
[34] Dealing with missing data based on data envelopment analysis and halo effect
Zha, Yong
Song, Ali
Xu, Chuanyong
Yang, Honglin
[J]. APPLIED MATHEMATICAL MODELLING, 2013, 37 (09) : 6135 - 6145
[35] Dealing with Software Model Quality in Practice Experience in a Research Project
de la Vara, Jose Luis
Espinoza, Huascar
[J]. 2013 13TH INTERNATIONAL CONFERENCE ON QUALITY SOFTWARE (QSIC), 2013, : 396 - 405
[36] Missing entry replacement data analysis: A replacement approach to dealing with missing data in paleontological and total evidence data sets
Norell, MA
Wheeler, W
[J]. JOURNAL OF VERTEBRATE PALEONTOLOGY, 2003, 23 (02) : 275 - 283
[37] Dealing with Missing Data using a Selection Algorithm on Rough Sets
Jonathan Prieto-Cubides
Camilo Argoty
[J]. International Journal of Computational Intelligence Systems, 2018, 11 : 1307 - 1321
[38] Dealing with missing data by EM in single-case studies
Li-Ting Chen
Yanan Feng
Po-Ju Wu
Chao-Ying Joanne Peng
[J]. Behavior Research Methods, 2020, 52 : 131 - 150
[39] Problems in dealing with missing data and informative censoring in clinical trials
Weichung Joseph Shih
[J]. Current Controlled Trials in Cardiovascular Medicine, 2002, 3
[40] Dealing with missing covariate data in fishery stock assessment models
Maunder, Mark N.
Deriso, Richard B.
[J]. FISHERIES RESEARCH, 2010, 101 (1-2) : 80 - 86

← 1 2 3 4 5 →