gcimpute: A Package for Missing Data Imputation

被引:0
|
作者
Zhao, Yuxuan [1 ]
Udell, Madeleine [2 ]
机构
[1] Cornell Univ, Dept Stat & Data Sci, Ithaca, NY 14850 USA
[2] Stanford Univ, Management Sci & Engn, Stanford, CA 94305 USA
来源
JOURNAL OF STATISTICAL SOFTWARE | 2024年 / 108卷 / 04期
关键词
missing data; single imputation; multiple imputation; Gaussian copula; mixed data; imputation uncertainty; !text type='Python']Python[!/text;
D O I
10.18637/jss.v108.i04
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
This article introduces the Python package gcimpute for missing data imputation. Package gcimpute can impute missing data with many different variable types, including continuous, binary, ordinal, count, and truncated values, by modeling data as samples from a Gaussian copula model. This semiparametric model learns the marginal distribution of each variable to match the empirical distribution, yet describes the interactions between variables with a joint Gaussian that enables fast inference, imputation with confidence intervals, and multiple imputation. The package also provides specialized extensions to handle large datasets (with complexity linear in the number of observations) and streaming datasets (with online imputation). This article describes the underlying methodology and demonstrates how to use the software package.
引用
收藏
页码:1 / 27
页数:27
相关论文
共 50 条
  • [41] Missing Data Imputation with High-Dimensional Data
    Brini, Alberto
    van den Heuvel, Edwin R.
    AMERICAN STATISTICIAN, 2024, 78 (02): : 240 - 252
  • [42] Missing data imputation in multivariate data by evolutionary algorithms
    Figueroa Garcia, Juan C.
    Kalenatic, Dusko
    Lopez Bello, Cesar Amilcar
    COMPUTERS IN HUMAN BEHAVIOR, 2011, 27 (05) : 1468 - 1474
  • [43] Exploring the Effects of Data Distribution in Missing Data Imputation
    Soares, Jastin Pompeu
    Santos, Miriam Seoane
    Abreu, Pedro Henriques
    Araujo, Helder
    Santos, Joao
    ADVANCES IN INTELLIGENT DATA ANALYSIS XVII, IDA 2018, 2018, 11191 : 251 - 263
  • [44] Multiple imputation of missing data for survey data analysis
    Lupo, Coralie
    Le Bouquin, Sophie
    Michel, Virginie
    Colin, Pierre
    Chauvin, Claire
    EPIDEMIOLOGIE ET SANTE ANIMALE, 2008, NO 53, 2008, (53): : 73 - 83
  • [45] A new analytical framework for missing data imputation and classification with uncertainty: Missing data imputation and heart failure readmission prediction
    Hu, Zhiyong
    Du, Dongping
    PLOS ONE, 2020, 15 (09):
  • [46] A systematic review of generative adversarial imputation network in missing data imputation
    Yuqing Zhang
    Runtong Zhang
    Butian Zhao
    Neural Computing and Applications, 2023, 35 : 19685 - 19705
  • [47] A systematic review of generative adversarial imputation network in missing data imputation
    Zhang, Yuqing
    Zhang, Runtong
    Zhao, Butian
    NEURAL COMPUTING & APPLICATIONS, 2023, 35 (27): : 19685 - 19705
  • [48] Using association rule for missing data imputation
    Wu, Jianhua
    Song, Qinbao
    Shen, Junyi
    Journal of Information and Computational Science, 2007, 4 (04): : 1155 - 1161
  • [49] Multiple imputation for missing data: a brief introduction
    Baccini, Michela
    EPIDEMIOLOGIA & PREVENZIONE, 2008, 32 (03): : 162 - 163
  • [50] MICROARRAY MISSING DATA IMPUTATION USING REGRESSION
    Bayrak, Tuncay
    Ogul, Hasan
    2017 13TH IASTED INTERNATIONAL CONFERENCE ON BIOMEDICAL ENGINEERING (BIOMED), 2017, : 68 - 73