Imputation techniques for multivariate missingness in software measurement data

被引:0
|
作者
Taghi M. Khoshgoftaar
Jason Van Hulse
机构
[1] Florida Atlantic University,Department of Computer Science and Engineering
来源
Software Quality Journal | 2008年 / 16卷
关键词
Imputation; Software quality; Missing data; Data quality; Bayesian multiple imputation;
D O I
暂无
中图分类号
学科分类号
摘要
The problem of missing values in software measurement data used in empirical analysis has led to the proposal of numerous potential solutions. Imputation procedures, for example, have been proposed to ‘fill-in’ the missing values with plausible alternatives. We present a comprehensive study of imputation techniques using real-world software measurement datasets. Two different datasets with dramatically different properties were utilized in this study, with the injection of missing values according to three different missingness mechanisms (MCAR, MAR, and NI). We consider the occurrence of missing values in multiple attributes, and compare three procedures, Bayesian multiple imputation, k Nearest Neighbor imputation, and Mean imputation. We also examine the relationship between noise in the dataset and the performance of the imputation techniques, which has not been addressed previously. Our comprehensive experiments demonstrate conclusively that Bayesian multiple imputation is an extremely effective imputation technique.
引用
收藏
页码:563 / 600
页数:37
相关论文
共 50 条
  • [1] Imputation techniques for multivariate missingness in software measurement data
    Khoshgoftaar, Taghi M.
    Van Hulse, Jason
    SOFTWARE QUALITY JOURNAL, 2008, 16 (04) : 563 - 600
  • [2] IMPUTATION OF MISSING DATA WITH DIFFERENT MISSINGNESS MECHANISM
    Kang, Ho Ming
    Yusof, Fadhilah
    Mohamad, Ismail
    JURNAL TEKNOLOGI, 2012, 57
  • [3] Software Fault Imputation in Noisy and Incomplete Measurement Data
    Folleco, Andres
    Khoshgoftaar, Taghi M.
    Van Hulse, Jason
    RECENT ADVANCES IN RELIABILITY AND QUALITY IN DESIGN, 2008, : 255 - 274
  • [4] CHOOSING AMONG IMPUTATION TECHNIQUES FOR INCOMPLETE MULTIVARIATE DATA - A SIMULATION STUDY
    BELLO, AL
    COMMUNICATIONS IN STATISTICS-THEORY AND METHODS, 1993, 22 (03) : 853 - 877
  • [5] Software fault imputation accuracy in noisy and incomplete measurement data
    Folleco, Andres
    Khoshgoftaar, Taghi
    Van Hulse, Jason
    Bullard, Lofton
    TWELFTH ISSAT INTERNATIONAL CONFERENCE RELIABILITY AND QUALITY IN DESIGN, PROCEEDINGS, 2006, : 144 - +
  • [6] Studying missingness in spinal cord injury data: challenges and impact of data imputation
    Lucie Bourguignon
    Louis P. Lukas
    James D. Guest
    Fred H. Geisler
    Vanessa Noonan
    Armin Curt
    Sarah C. Brüningk
    Catherine R. Jutzeler
    BMC Medical Research Methodology, 24
  • [7] Studying missingness in spinal cord injury data: challenges and impact of data imputation
    Bourguignon, Lucie
    Lukas, Louis P.
    Guest, James D.
    Geisler, Fred H.
    Noonan, Vanessa
    Curt, Armin
    Brueningk, Sarah C.
    Jutzeler, Catherine R.
    BMC MEDICAL RESEARCH METHODOLOGY, 2024, 24 (01)
  • [8] Methylation data imputation performances under different representations and missingness patterns
    Di Lena, Pietro
    Sala, Claudia
    Prodi, Andrea
    Nardini, Christine
    BMC BIOINFORMATICS, 2020, 21 (01)
  • [9] Methylation data imputation performances under different representations and missingness patterns
    Pietro Di Lena
    Claudia Sala
    Andrea Prodi
    Christine Nardini
    BMC Bioinformatics, 21
  • [10] Multiple Imputation for Robust Cluster Analysis to Address Missingness in Medical Data
    Harder, Arnold A.
    Olbricht, Gayla R.
    Ekuma, Godwin
    Hier, Daniel B.
    Obafemi-Ajayi, Tayo
    IEEE ACCESS, 2024, 12 : 42974 - 42991