Missing Data Assumptions

被引:20
|
作者
Little, Roderick J. [1 ]
机构
[1] Univ Michigan, Dept Biostat, Ann Arbor, MI 48105 USA
关键词
missing at random; ignorable missing data; Bayesian and frequentist inference; incomplete data; informative missingness; likelihood inference; missing-data mechanism; partially missing at random; MAXIMUM-LIKELIHOOD ESTIMATION; DROP-OUT; LONGITUDINAL DATA; IMPUTATION; REGRESSION; INFERENCE; MODELS;
D O I
10.1146/annurev-statistics-040720-031104
中图分类号
O1 [数学];
学科分类号
0701 ; 070101 ;
摘要
I review assumptions about the missing-data mechanisms that underlie methods for the statistical analysis of data with missing values. I describe Rubin's original definition of missing at random (MAR), its motivation and criticisms, and his sufficient conditions for ignoring the missingness mechanism for likelihood-based, Bayesian, and frequentist inference. Related definitions, including missing completely at random, always MAR, always missing completely at random, and partially MAR, are also covered. I present a formal argument for weakening Rubin's sufficient conditions for frequentist maximum likelihood inference with precision based on the observed information. Some simple examples of MAR are described, together with an example where the missingness mechanism can be ignored even though MAR does not hold. Alternative approaches to statistical inference based on the likelihood function are reviewed, along with non-likelihood frequentist approaches, including weighted generalized estimating equations. Connections with the causal inference literature are also discussed. Finally, alternatives to Rubin's MAR definition are discussed, including informative missingness, informative censoring, and coarsening at random. The intent is to provide a relatively nontechnical discussion, although some of the underlying issues are challenging and touch on fundamental questions of statistical inference.
引用
收藏
页码:89 / 107
页数:19
相关论文
共 50 条
  • [1] Missing Data and Convenient Assumptions
    Normand, Sharon-Lise T.
    [J]. CIRCULATION-CARDIOVASCULAR QUALITY AND OUTCOMES, 2010, 3 (01): : 2 - 3
  • [2] Robustness to Parametric Assumptions in Missing Data Models
    Graham, Bryan S.
    Hirano, Keisuke
    [J]. AMERICAN ECONOMIC REVIEW, 2011, 101 (03): : 538 - 543
  • [3] Missing data assumptions and methods in a smoking cessation study
    Barnes, Sunni A.
    Larsen, Michael D.
    Schroeder, Darrell
    Hanson, Andrew
    Decker, Paul A.
    [J]. ADDICTION, 2010, 105 (03) : 431 - 437
  • [4] The Missing Data Assumptions of the NEAT Design and their Implications for Test Equating
    Sandip Sinharay
    Paul W. Holland
    [J]. Psychometrika, 2010, 75 : 309 - 327
  • [5] The impact of alternative assumptions on QALY calculations in the presence of missing data
    Plumpton, Catrin O.
    Hughes, Dyfrig A.
    [J]. TRIALS, 2019, 20
  • [6] Estimating ETAS: The effects of truncation, missing data, and model assumptions
    Seif, Stefanie
    Mignan, Arnaud
    Zechar, Jeremy Douglas
    Werner, Maximilian Jonas
    Wiemer, Stefan
    [J]. JOURNAL OF GEOPHYSICAL RESEARCH-SOLID EARTH, 2017, 122 (01) : 449 - 469
  • [7] The Missing Data Assumptions of the NEAT Design and their Implications for Test Equating
    Sinharay, Sandip
    Holland, Paul W.
    [J]. PSYCHOMETRIKA, 2010, 75 (02) : 309 - 327
  • [8] Fragmentary taxa, missing data, and ambiguity: Mistaken assumptions and conclusions
    Kearney, M
    [J]. SYSTEMATIC BIOLOGY, 2002, 51 (02) : 369 - 381
  • [9] Identifiability assumptions for missing covariate data in failure time regression models
    Rathouz, Paul J.
    [J]. BIOSTATISTICS, 2007, 8 (02) : 345 - 356
  • [10] Estimating treatment effects under untestable assumptions with nonignorable missing data
    Gomes, Manuel
    Kenward, Michael G.
    Grieve, Richard
    Carpenter, James
    [J]. STATISTICS IN MEDICINE, 2020, 39 (11) : 1658 - 1674