Dealing with missing software project data

被引:47
|
作者
Cartwright, MH [1 ]
Shepperd, MJ [1 ]
Song, Q [1 ]
机构
[1] Bournemouth Univ, Sch Design Engn & Comp, Empir Software Engn Res Grp, Bournemouth BH1 3LT, Dorset, England
关键词
project effort estimation; imputation; data analysis;
D O I
10.1109/METRIC.2003.1232464
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
Whilst there is a general consensus that quantitative approaches are an important part of successful software project management, there has been relatively little research into many of the obstacles to data collection and analysis in the real world. One feature that characterises many of the data sets we deal with is missing or highly questionable values. Naturally this problem is not unique to software engineering, so in this paper we explore the application of two existing data imputation techniques that have been used to good effect elsewhere. In order to assess the potential value of imputation we use two industrial data sets. Both are quite problematic from an effort modelling perspective because they contain few cases, have a significant number of missing values and the projects are quite heterogeneous. The question we pose is can imputation help? To answer we examine the quality of fit of effort models derived by stepwise regression on the raw data and data sets with values imputed by various techniques is compared In both data sets we find that k-Nearest Neighbour (k-NN) and sample mean imputation (SMI) significantly improve the model fit, with k-NN giving the best results. These results are consistent with other recently published results, consequently we conclude that imputation can assist empirical software engineering.
引用
收藏
页码:154 / 165
页数:12
相关论文
共 50 条
  • [1] Dealing with Missing Values in Software Project Datasets: A Systematic Mapping Study
    Idri, Ali
    Abnane, Ibtissam
    Abran, Alain
    [J]. SOFTWARE ENGINEERING, ARTIFICIAL INTELLIGENCE, NETWORKING AND PARALLEL/DISTRIBUTED COMPUTING, 2016, 653 : 1 - 16
  • [2] Dealing With Missing Data
    Sainani, Kristin L.
    [J]. PM&R, 2015, 7 (09) : 990 - 994
  • [3] A comparison of various software tools for dealing with missing data via imputation
    Abrahantes, Jose Cortinas
    Sotto, Cristina
    Molenberghs, Geert
    Vromman, Geert
    Bierinckx, Bart
    [J]. JOURNAL OF STATISTICAL COMPUTATION AND SIMULATION, 2011, 81 (11) : 1653 - 1675
  • [4] Innovations in dealing with missing data or missing reports
    Meng, Xiao-Li
    [J]. STATISTICA SINICA, 2006, 16 (04) : 1061 - 1070
  • [5] Dealing with deficient and missing data
    Dohoo, Ian R.
    [J]. PREVENTIVE VETERINARY MEDICINE, 2015, 122 (1-2) : 221 - 228
  • [6] Dealing With Uncertainties in Software Project Management
    Marinho, Marcelo
    Sampaio, Suzana
    Luna, Alexandre
    Lima, Telma
    Moura, Hermano
    [J]. CIT/IUCC/DASC/PICOM 2015 IEEE INTERNATIONAL CONFERENCE ON COMPUTER AND INFORMATION TECHNOLOGY - UBIQUITOUS COMPUTING AND COMMUNICATIONS - DEPENDABLE, AUTONOMIC AND SECURE COMPUTING - PERVASIVE INTELLIGENCE AND COMPUTING, 2015, : 889 - 894
  • [7] Dealing with missing data: Part II
    Walczak, B
    Massart, DL
    [J]. CHEMOMETRICS AND INTELLIGENT LABORATORY SYSTEMS, 2001, 58 (01) : 29 - 42
  • [8] Multiple imputation: dealing with missing data
    de Goeij, Moniek C. M.
    van Diepen, Merel
    Jager, Kitty J.
    Tripepi, Giovanni
    Zoccali, Carmine
    Dekker, Friedo W.
    [J]. NEPHROLOGY DIALYSIS TRANSPLANTATION, 2013, 28 (10) : 2415 - 2420
  • [9] Dealing With Missing Data for Prognostic Purposes
    Loukopoulos, Panagiotis
    Sampath, Suresh
    Pilidis, Pericles
    Zolkiewski, George
    Bennett, Ian
    Duan, Fang
    Mba, David
    [J]. 2016 PROGNOSTICS AND SYSTEM HEALTH MANAGEMENT CONFERENCE (PHM-CHENGDU), 2016,
  • [10] Dealing with gene expression missing data
    Bras, L. P.
    Menezes, J. C.
    [J]. IEE PROCEEDINGS SYSTEMS BIOLOGY, 2006, 153 (03): : 105 - 119