Imputation techniques for multivariate missingness in software measurement data

被引:0
|
作者
Taghi M. Khoshgoftaar
Jason Van Hulse
机构
[1] Florida Atlantic University,Department of Computer Science and Engineering
来源
Software Quality Journal | 2008年 / 16卷
关键词
Imputation; Software quality; Missing data; Data quality; Bayesian multiple imputation;
D O I
暂无
中图分类号
学科分类号
摘要
The problem of missing values in software measurement data used in empirical analysis has led to the proposal of numerous potential solutions. Imputation procedures, for example, have been proposed to ‘fill-in’ the missing values with plausible alternatives. We present a comprehensive study of imputation techniques using real-world software measurement datasets. Two different datasets with dramatically different properties were utilized in this study, with the injection of missing values according to three different missingness mechanisms (MCAR, MAR, and NI). We consider the occurrence of missing values in multiple attributes, and compare three procedures, Bayesian multiple imputation, k Nearest Neighbor imputation, and Mean imputation. We also examine the relationship between noise in the dataset and the performance of the imputation techniques, which has not been addressed previously. Our comprehensive experiments demonstrate conclusively that Bayesian multiple imputation is an extremely effective imputation technique.
引用
收藏
页码:563 / 600
页数:37
相关论文
共 50 条
  • [41] Multiple imputation of unordered categorical missing data: A comparison of the multivariate normal imputation and multiple imputation by chained equations
    Karangwa, Innocent
    Kotze, Danelle
    Blignaut, Renette
    BRAZILIAN JOURNAL OF PROBABILITY AND STATISTICS, 2016, 30 (04) : 521 - 539
  • [42] A comparison of imputation techniques for handling missing data
    Musil, CM
    Warner, CB
    Yobas, PK
    Jones, SL
    WESTERN JOURNAL OF NURSING RESEARCH, 2002, 24 (07) : 815 - 829
  • [43] Imputation for multisource data with comparison and assessment techniques
    Casleton, Emily
    Osthus, Dave
    Van Buren, Kendra
    APPLIED STOCHASTIC MODELS IN BUSINESS AND INDUSTRY, 2018, 34 (01) : 44 - 60
  • [44] Revitalizing temperature records: A novel framework towards continuous data reconstruction using univariate and multivariate imputation techniques
    Kumar, Hanumapura Kumaraswamy Yashas
    Varija, Kumble
    ATMOSPHERIC RESEARCH, 2024, 312
  • [45] Ensemble imputation methods for missing software engineering data
    Twala, B
    Cartwright, M
    2005 11TH INTERNATIONAL SYMPOSIUM ON SOFTWARE METRICS (METRICS), 2005, : 268 - 277
  • [46] Missing Data Imputation Algorithm for Transmission Systems Based on Multivariate Imputation With Principal Component Analysis
    Sim, Yeon-Sub
    Hwang, Jae-Sang
    Mun, Sung-Duk
    Kim, Tae-Joon
    Chang, Seung Jin
    IEEE ACCESS, 2022, 10 : 83195 - 83203
  • [47] Missing Data Imputation for a Multivariate Outcome of Mixed Variable Types
    Wang, Tuo
    Zilinskas, Rachel
    Li, Ying
    Qu, Yongming
    STATISTICS IN BIOPHARMACEUTICAL RESEARCH, 2023, 15 (04): : 826 - 837
  • [48] On multivariate imputation and forecasting of decadal wind speed missing data
    Wesonga, Ronald
    SPRINGERPLUS, 2015, 4
  • [50] Missing Data and Multiple Imputation in the Context of Multivariate Analysis of Variance
    Finch, W. Holmes
    JOURNAL OF EXPERIMENTAL EDUCATION, 2016, 84 (02): : 356 - 372