Expanding Tidy Data Principles to Facilitate Missing Data Exploration, Visualization and Assessment of Imputations

被引:47
|
作者
Tierney, Nicholas [1 ,2 ]
Cook, Dianne [1 ]
机构
[1] Monash Univ, Melbourne, Vic, Australia
[2] Telethon Kids Inst, Perth, WA, Australia
来源
JOURNAL OF STATISTICAL SOFTWARE | 2023年 / 105卷 / 07期
关键词
statistical computing; statistical graphics; data science; data visualization; tidy-verse; data pipeline; R; R-PACKAGE;
D O I
10.18637/jss.v105.i07
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
Despite the large body of research on missing value distributions and imputation, there is comparatively little literature with a focus on how to make it easy to handle, explore, and impute missing values in data. This paper addresses this gap. The new methodology builds upon tidy data principles, with the goal of integrating missing value handling as a key part of data analysis workflows. We define a new data structure, and a suite of new operations. Together, these provide a connected framework for handling, exploring, and imputing missing values. These methods are available in the R package naniar.
引用
收藏
页码:1 / 31
页数:31
相关论文
共 50 条
  • [1] Multiple imputations for missing data: a simulation with epidemiological data
    Nunes, Luciana Neves
    Klueck, Mariza Machado
    Guimaraes Fachel, Jandyra Maria
    [J]. CADERNOS DE SAUDE PUBLICA, 2009, 25 (02): : 268 - 278
  • [2] What Improves with Increased Missing Data Imputations?
    Bodner, Todd E.
    [J]. STRUCTURAL EQUATION MODELING-A MULTIDISCIPLINARY JOURNAL, 2008, 15 (04) : 651 - 675
  • [3] SEQUENTIAL IMPUTATIONS AND BAYESIAN MISSING DATA PROBLEMS
    KONG, A
    LIU, JS
    WONG, WH
    [J]. JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION, 1994, 89 (425) : 278 - 288
  • [4] A note on determining the number of imputations for missing data
    Hershberger, SL
    Fisher, DG
    [J]. STRUCTURAL EQUATION MODELING, 2003, 10 (04): : 648 - 650
  • [5] Scaling Out Big Data Missing Value Imputations
    Anagnostopoulos, Christos
    Triantafillou, Peter
    [J]. PROCEEDINGS OF THE 20TH ACM SIGKDD INTERNATIONAL CONFERENCE ON KNOWLEDGE DISCOVERY AND DATA MINING (KDD'14), 2014, : 651 - 660
  • [6] Quality Assessment of Imputations in Administrative Data
    Schnetzer, Matthias
    Astleithner, Franz
    Cetkovic, Predrag
    Humer, Stefan
    Lenk, Manuela
    Moser, Mathias
    [J]. JOURNAL OF OFFICIAL STATISTICS, 2015, 31 (02) : 231 - 247
  • [7] Missing data, part 5. Introduction to multiple imputations
    Pham, Tra My
    Pandis, Nikolaos
    White, Ian R.
    [J]. AMERICAN JOURNAL OF ORTHODONTICS AND DENTOFACIAL ORTHOPEDICS, 2022, 162 (04) : 581 - 583
  • [8] A New Tidy Data Structure to Support Exploration and Modeling of Temporal Data
    Wang, Earo
    Cook, Dianne
    Hyndman, Rob J.
    [J]. JOURNAL OF COMPUTATIONAL AND GRAPHICAL STATISTICS, 2020, 29 (03) : 466 - 478
  • [9] Inverse probability weighting or multiple imputations for nonmonotone missing data?
    Ross, Rachael
    Cole, Stephen
    Westreich, Daniel
    Daniels, Julie
    Stringer, Jeffrey
    Edwards, Jessie
    [J]. PHARMACOEPIDEMIOLOGY AND DRUG SAFETY, 2022, 31 : 3 - 3
  • [10] Linking missing data to study outcomes using multiple imputations
    Ibrahim, Khadija
    [J]. CANADIAN JOURNAL OF PUBLIC HEALTH-REVUE CANADIENNE DE SANTE PUBLIQUE, 2015, 106 (02): : E82 - E82