Analyzing data sets with missing data: An empirical evaluation of imputation methods and likelihood-based methods

被引:152
|
作者
Myrtveit, I [1 ]
Stensrud, E [1 ]
Olsson, UH [1 ]
机构
[1] Norwegian Sch Management, N-1301 Sandvika, Norway
关键词
software effort prediction; cost estimation; missing data; imputation methods; listwise deletion; mean imputation; similar response pattern imputation; full information maximum likelihood; log-log regression; ERP;
D O I
10.1109/32.965340
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
Missing data are often encountered in data sets used to construct effort prediction models. Thus far, the common practice has been to ignore observations with missing data. This may result in biased prediction models. In this paper, we evaluate four missing data techniques (MDTs) in the context of software cost modeling: listwise deletion (LID), mean imputation (MI), similar response pattern imputation (SRPI), and full information maximum likelihood (FIML). We apply the MDTs to an ERP data set, and thereafter construct regression-based prediction models using the resulting data sets. The evaluation suggests that only FIML is appropriate when the data are not missing completely at random (MCAR). Unlike FIML, prediction models constructed on LD, MI and SRPI data sets will be biased unless the data are MCAR. Furthermore, compared to LID, MI and SRPI seem appropriate only if the resulting LID data set is too small to enable the construction of a meaningful regression-based prediction model.
引用
收藏
页码:999 / 1013
页数:15
相关论文
共 50 条
  • [41] Imputation Methods for Multiple Regression with Missing Heteroscedastic Data
    Asif, Muhammad
    Samart, Klairung
    [J]. THAILAND STATISTICIAN, 2022, 20 (01): : 1 - 15
  • [42] Some Concerns About Imputation Methods for Missing Data
    Toyomoto, Rie
    Funada, Satoshi
    Furukawa, Toshi A.
    [J]. JAMA PSYCHIATRY, 2022, 79 (03) : 270 - 270
  • [43] Missing data imputation methods and their performance with biodistance analyses
    Kenyhercz, Michael W.
    Passalacqua, Nicholas V.
    [J]. AMERICAN JOURNAL OF PHYSICAL ANTHROPOLOGY, 2015, 156 : 185 - 185
  • [44] Evaluating Imputation Methods for Missing Data in a MCI Dataset
    Gomez-Valades Batanero, Alba
    Rincon Zamorano, Mariano
    Martinez Tomas, Rafael
    Guerrero Martin, Juan
    [J]. ARTIFICIAL INTELLIGENCE IN NEUROSCIENCE: AFFECTIVE ANALYSIS AND HEALTH APPLICATIONS, PT I, 2022, 13258 : 446 - 454
  • [45] Missing Network Data A Comparison of Different Imputation Methods
    Krause, Robert W.
    Huisman, Mark
    Steglich, Christian
    Snijders, Tom A. B.
    [J]. 2018 IEEE/ACM INTERNATIONAL CONFERENCE ON ADVANCES IN SOCIAL NETWORKS ANALYSIS AND MINING (ASONAM), 2018, : 159 - 163
  • [46] Spectral methods for imputation of missing air quality data
    Shai Moshenberg
    Uri Lerner
    Barak Fishbain
    [J]. Environmental Systems Research, 4 (1)
  • [47] Technical note: Evaluation of missing data imputation methods for human osteometric measurements
    Pang, Jinyong
    Liu, Xiaoming
    [J]. AMERICAN JOURNAL OF BIOLOGICAL ANTHROPOLOGY, 2023, 181 (04): : 666 - 676
  • [48] Effect of Missing Data Types and Imputation Methods on Supervised Classifiers: An Evaluation Study
    Gabr, Menna Ibrahim
    Helmy, Yehia Mostafa
    Elzanfaly, Doaa Saad
    [J]. BIG DATA AND COGNITIVE COMPUTING, 2023, 7 (01)
  • [49] Evaluation of Missing Data Imputation Methods for an Enhanced Distributed PV Generation Prediction
    Sundararajan, Aditya
    Sarwat, Arif I.
    [J]. PROCEEDINGS OF THE FUTURE TECHNOLOGIES CONFERENCE (FTC) 2019, VOL 1, 2020, 1069 : 590 - 609
  • [50] Likelihood-based inference for spatiotemporal data with censored and missing responses
    Valeriano, Katherine A. L.
    Lachos, Victor H.
    Prates, Marcos O.
    Matos, Larissa A.
    [J]. ENVIRONMETRICS, 2021, 32 (03)