gcimpute: A Package for Missing Data Imputation

被引:0
|
作者
Zhao, Yuxuan [1 ]
Udell, Madeleine [2 ]
机构
[1] Cornell Univ, Dept Stat & Data Sci, Ithaca, NY 14850 USA
[2] Stanford Univ, Management Sci & Engn, Stanford, CA 94305 USA
来源
JOURNAL OF STATISTICAL SOFTWARE | 2024年 / 108卷 / 04期
关键词
missing data; single imputation; multiple imputation; Gaussian copula; mixed data; imputation uncertainty; !text type='Python']Python[!/text;
D O I
10.18637/jss.v108.i04
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
This article introduces the Python package gcimpute for missing data imputation. Package gcimpute can impute missing data with many different variable types, including continuous, binary, ordinal, count, and truncated values, by modeling data as samples from a Gaussian copula model. This semiparametric model learns the marginal distribution of each variable to match the empirical distribution, yet describes the interactions between variables with a joint Gaussian that enables fast inference, imputation with confidence intervals, and multiple imputation. The package also provides specialized extensions to handle large datasets (with complexity linear in the number of observations) and streaming datasets (with online imputation). This article describes the underlying methodology and demonstrates how to use the software package.
引用
收藏
页码:1 / 27
页数:27
相关论文
共 50 条
  • [31] Optimized parameters for missing data imputation
    Zhang, Shichao
    Qin, Yongsong
    Zhu, Xiaofeng
    Zhang, Jilian
    Zhang, Chengqi
    PRICAI 2006: TRENDS IN ARTIFICIAL INTELLIGENCE, PROCEEDINGS, 2006, 4099 : 1010 - 1016
  • [32] Evaluating the Impact of Missing Data Imputation
    Pantanowitz, Adam
    Marwala, Tshildzi
    ADVANCED DATA MINING AND APPLICATIONS, PROCEEDINGS, 2009, 5678 : 577 - 586
  • [33] Cooperative Clustering Missing Data Imputation
    Wan, Daoming
    Razavi-Far, Roozbeh
    Saif, Mehrdad
    2020 IEEE INTERNATIONAL CONFERENCE ON SYSTEMS, MAN, AND CYBERNETICS (SMC), 2020, : 1039 - 1045
  • [34] MISSING DATA, IMPUTATION AND REGRESSION TREES
    Loh, Wei-Yin
    Zhang, Qiong
    Zhang, Wenwen
    Zhou, Peigen
    STATISTICA SINICA, 2020, 30 (04) : 1697 - 1722
  • [35] Imputation of missing data in industrial databases
    Lakshminarayan, K
    Harp, SA
    Samad, T
    APPLIED INTELLIGENCE, 1999, 11 (03) : 259 - 275
  • [36] Multiple imputation for nonignorable missing data
    Jongho Im
    Soeun Kim
    Journal of the Korean Statistical Society, 2017, 46 : 583 - 592
  • [37] Multiple imputation of missing data in multilevel models with the R package mdmb: a flexible sequential modeling approach
    Simon Grund
    Oliver Lüdtke
    Alexander Robitzsch
    Behavior Research Methods, 2021, 53 : 2631 - 2649
  • [38] Multiple imputation of missing data in multilevel models with the R package mdmb: a flexible sequential modeling approach
    Grund, Simon
    Luedtke, Oliver
    Robitzsch, Alexander
    BEHAVIOR RESEARCH METHODS, 2021, 53 (06) : 2631 - 2649
  • [39] Missing phenotype data imputation in pedigree data analysis
    Fridley, B
    de Andrade, M
    GENETIC EPIDEMIOLOGY, 2005, 29 (03) : 249 - 249
  • [40] Missing phenotype data imputation in pedigree data analysis
    Fridley, Brooke L.
    de Andrade, Mariza
    GENETIC EPIDEMIOLOGY, 2008, 32 (01) : 52 - 60