Imputation techniques for multivariate missingness in software measurement data

被引:0
|
作者
Taghi M. Khoshgoftaar
Jason Van Hulse
机构
[1] Florida Atlantic University,Department of Computer Science and Engineering
来源
Software Quality Journal | 2008年 / 16卷
关键词
Imputation; Software quality; Missing data; Data quality; Bayesian multiple imputation;
D O I
暂无
中图分类号
学科分类号
摘要
The problem of missing values in software measurement data used in empirical analysis has led to the proposal of numerous potential solutions. Imputation procedures, for example, have been proposed to ‘fill-in’ the missing values with plausible alternatives. We present a comprehensive study of imputation techniques using real-world software measurement datasets. Two different datasets with dramatically different properties were utilized in this study, with the injection of missing values according to three different missingness mechanisms (MCAR, MAR, and NI). We consider the occurrence of missing values in multiple attributes, and compare three procedures, Bayesian multiple imputation, k Nearest Neighbor imputation, and Mean imputation. We also examine the relationship between noise in the dataset and the performance of the imputation techniques, which has not been addressed previously. Our comprehensive experiments demonstrate conclusively that Bayesian multiple imputation is an extremely effective imputation technique.
引用
收藏
页码:563 / 600
页数:37
相关论文
共 50 条
  • [31] Method of missing data imputation for multivariate time series
    Li Z.
    Zhang F.
    Wang Y.
    Tao Q.
    Li C.
    2018, Chinese Institute of Electronics (40): : 225 - 230
  • [32] A Comparative Study of Imputation Methods for Multivariate Ordinal Data
    Wongkamthong, Chayut
    Akande, Olanrewaju
    JOURNAL OF SURVEY STATISTICS AND METHODOLOGY, 2023, 11 (01) : 189 - 212
  • [33] Multiple edit/multiple imputation for multivariate continuous data
    Ghosh-Dastidar, B
    Schafer, JL
    JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION, 2003, 98 (464) : 807 - 817
  • [34] Multiple-Model Multiple Imputation for Longitudinal Count Data to Address Uncertainty in Missingness Mechanism
    Farahani, E. Jalali
    Baghfalaki, T.
    APPLICATIONS AND APPLIED MATHEMATICS-AN INTERNATIONAL JOURNAL, 2018, 13 (01): : 84 - 96
  • [35] Bayesian Simultaneous Edit and Imputation for Multivariate Categorical Data
    Manrique-Vallier, Daniel
    Reiter, Jerome P.
    JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION, 2017, 112 (520) : 1708 - 1719
  • [36] Multivariate data imputation using Gaussian mixture models
    Silva, Diogo S. F.
    Deutsch, Clayton, V
    SPATIAL STATISTICS, 2018, 27 : 74 - 90
  • [37] A Method for Improving Imputation and Prediction Accuracy of Highly Seasonal Univariate Data with Large Periods of Missingness
    Chaudhry, Aizaz
    Li, Wei
    Basri, Amir
    Patenaude, Francois
    WIRELESS COMMUNICATIONS & MOBILE COMPUTING, 2019, 2019
  • [38] The Performance of Multiple Imputation in Social Surveys with Missing Data from Planned Missingness and Item Nonresponse
    Axenfeld, Julian B.
    Bruch, Christian
    Wolf, Christof
    Blom, Annelies G.
    SURVEY RESEARCH METHODS, 2024, 18 (02): : 137 - 151
  • [39] Application of Sequential Regression Multivariate Imputation Method on Multivariate Normal Missing Data
    Nurzaman
    Siswantining, Titin
    Soemartojo, Saskya Mary
    Sarwinda, Devvi
    2019 3RD INTERNATIONAL CONFERENCE ON INFORMATICS AND COMPUTATIONAL SCIENCES (ICICOS 2019), 2019,
  • [40] Multiple Imputation for Missing Data: Fully Conditional Specification Versus Multivariate Normal Imputation
    Lee, Katherine J.
    Carlin, John B.
    AMERICAN JOURNAL OF EPIDEMIOLOGY, 2010, 171 (05) : 624 - 632