Imputation techniques for multivariate missingness in software measurement data

被引:0
|
作者
Taghi M. Khoshgoftaar
Jason Van Hulse
机构
[1] Florida Atlantic University,Department of Computer Science and Engineering
来源
Software Quality Journal | 2008年 / 16卷
关键词
Imputation; Software quality; Missing data; Data quality; Bayesian multiple imputation;
D O I
暂无
中图分类号
学科分类号
摘要
The problem of missing values in software measurement data used in empirical analysis has led to the proposal of numerous potential solutions. Imputation procedures, for example, have been proposed to ‘fill-in’ the missing values with plausible alternatives. We present a comprehensive study of imputation techniques using real-world software measurement datasets. Two different datasets with dramatically different properties were utilized in this study, with the injection of missing values according to three different missingness mechanisms (MCAR, MAR, and NI). We consider the occurrence of missing values in multiple attributes, and compare three procedures, Bayesian multiple imputation, k Nearest Neighbor imputation, and Mean imputation. We also examine the relationship between noise in the dataset and the performance of the imputation techniques, which has not been addressed previously. Our comprehensive experiments demonstrate conclusively that Bayesian multiple imputation is an extremely effective imputation technique.
引用
收藏
页码:563 / 600
页数:37
相关论文
共 50 条
  • [21] Missing data imputation in multivariate data by evolutionary algorithms
    Figueroa Garcia, Juan C.
    Kalenatic, Dusko
    Lopez Bello, Cesar Amilcar
    COMPUTERS IN HUMAN BEHAVIOR, 2011, 27 (05) : 1468 - 1474
  • [22] Missing Data Imputation Techniques for Software Effort Estimation: A Study of Recent Issues and Challenges
    Almutlaq, Ayman Jalal Hassan
    Jawawi, Dayang N. A.
    EMERGING TRENDS IN INTELLIGENT COMPUTING AND INFORMATICS: DATA SCIENCE, INTELLIGENT INFORMATION SYSTEMS AND SMART COMPUTING, 2020, 1073 : 1144 - 1158
  • [23] Software measurement data reduction using ensemble techniques
    Wang, Huanjing
    Khoshgoftaar, Taghi M.
    Napolitano, Amri
    NEUROCOMPUTING, 2012, 92 : 124 - 132
  • [24] Imputation for Skewed Data: Multivariate Lomax Case
    Zhixin Lun
    Ravindra Khattree
    Sankhya B, 2021, 83 : 86 - 113
  • [25] A genetic algorithm for multivariate missing data imputation
    Carlos Figueroa-Garcia, Juan
    Neruda, Roman
    Hernandez-Perez, German
    INFORMATION SCIENCES, 2023, 619 : 947 - 967
  • [26] Imputation for Skewed Data: Multivariate Lomax Case
    Lun, Zhixin
    Khattree, Ravindra
    SANKHYA-SERIES B-APPLIED AND INTERDISCIPLINARY STATISTICS, 2021, 83 (SUPPL 1): : 86 - 113
  • [27] Multiple imputation using auxiliary imputation variables that only predict missingness can increase bias due to data missing not at random
    Curnow, Elinor
    Cornish, Rosie P.
    Heron, Jon E.
    Carpenter, James R.
    Tilling, Kate
    BMC MEDICAL RESEARCH METHODOLOGY, 2024, 24 (01)
  • [28] Accounting for not-at-random missingness through imputation stacking
    Beesley, Lauren J.
    Taylor, Jeremy M. G.
    STATISTICS IN MEDICINE, 2021, 40 (27) : 6118 - 6132
  • [29] Software quality imputation in the presence of noisy data
    Khoshgoftaar, Taghi M.
    Folleco, Andres
    Van Hulse, Jason
    Bullard, Lofton
    IRI 2006: PROCEEDINGS OF THE 2006 IEEE INTERNATIONAL CONFERENCE ON INFORMATION REUSE AND INTEGRATION, 2006, : 484 - +
  • [30] Experiences in the Application of a Multivariate Method for Imputation of Meteorological Data
    Luis Araya-Lopez, Jose
    TECNOLOGIA EN MARCHA, 2014, 27 (03): : 70 - 79