Dealing with missing software project data

被引:47
|
作者
Cartwright, MH [1 ]
Shepperd, MJ [1 ]
Song, Q [1 ]
机构
[1] Bournemouth Univ, Sch Design Engn & Comp, Empir Software Engn Res Grp, Bournemouth BH1 3LT, Dorset, England
关键词
project effort estimation; imputation; data analysis;
D O I
10.1109/METRIC.2003.1232464
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
Whilst there is a general consensus that quantitative approaches are an important part of successful software project management, there has been relatively little research into many of the obstacles to data collection and analysis in the real world. One feature that characterises many of the data sets we deal with is missing or highly questionable values. Naturally this problem is not unique to software engineering, so in this paper we explore the application of two existing data imputation techniques that have been used to good effect elsewhere. In order to assess the potential value of imputation we use two industrial data sets. Both are quite problematic from an effort modelling perspective because they contain few cases, have a significant number of missing values and the projects are quite heterogeneous. The question we pose is can imputation help? To answer we examine the quality of fit of effort models derived by stepwise regression on the raw data and data sets with values imputed by various techniques is compared In both data sets we find that k-Nearest Neighbour (k-NN) and sample mean imputation (SMI) significantly improve the model fit, with k-NN giving the best results. These results are consistent with other recently published results, consequently we conclude that imputation can assist empirical software engineering.
引用
收藏
页码:154 / 165
页数:12
相关论文
共 50 条
  • [41] Dealing with missing covariate data in fishery stock assessment models
    Maunder, Mark N.
    Deriso, Richard B.
    [J]. FISHERIES RESEARCH, 2010, 101 (1-2) : 80 - 86
  • [42] Dealing with missing data: An inpainting application to the MICROSCOPE space mission
    Berge, Joel
    Pires, Sandrine
    Baghi, Quentin
    Touboul, Pierre
    Metris, Gilles
    [J]. PHYSICAL REVIEW D, 2015, 92 (11):
  • [43] Dealing with Missing Data using a Selection Algorithm on Rough Sets
    Prieto-Cubides, Jonathan
    Argoty, Camilo
    [J]. INTERNATIONAL JOURNAL OF COMPUTATIONAL INTELLIGENCE SYSTEMS, 2018, 11 (01) : 1307 - 1321
  • [44] Dealing With Missing Outcome Data in Randomized Trials and Observational Studies
    Groenwold, Rolf H. H.
    Donders, A. Rogier T.
    Roes, Kit C. B.
    Harrell, Frank E., Jr.
    Moons, Karel G. M.
    [J]. AMERICAN JOURNAL OF EPIDEMIOLOGY, 2012, 175 (03) : 210 - 217
  • [45] Dealing with missing data in observational health care outcome analyses
    Norris, CM
    Ghali, WA
    Knudtson, ML
    Naylor, CD
    Saunders, LD
    [J]. JOURNAL OF CLINICAL EPIDEMIOLOGY, 2000, 53 (04) : 377 - 383
  • [46] Problems in dealing with missing data and informative censoring in clinical trials
    Shih, WCJ
    [J]. CURRENT CONTROLLED TRIALS IN CARDIOVASCULAR MEDICINE, 2002, 3 (1):
  • [47] Proper Use of Multiple Imputation and Dealing with Missing Covariate Data
    Saffari, Seyed Ehsan
    Volovici, Victor
    Ong, Marcus Eng Hock
    Goldstein, Benjamin Alan
    Vaughan, Roger
    Dammers, Ruben
    Steyerberg, Ewout W.
    Liu, Nan
    [J]. WORLD NEUROSURGERY, 2022, 161 : 284 - 290
  • [48] Plugging project data gaps - Uncovering the missing data
    Ioli, D
    Cazaubon, P
    [J]. CHEMICAL PROCESSING, 1998, 61 (02): : 91 - 91
  • [49] Dealing with missing images
    Geneix, Nicolas
    [J]. POSITIF, 2024, (757): : 21 - 22
  • [50] Dealing with Data and Software Interoperability Issues in Digital Factories
    Bicocchi, Nicola
    Cabri, Giacomo
    Mandreoli, Federica
    Mecella, Massimo
    [J]. TRANSDISCIPLINARY ENGINEERING METHODS FOR SOCIAL INNOVATION OF INDUSTRY 4.0, 2018, 7 : 13 - 22