Using Principal Components as Auxiliary Variables in Missing Data Estimation

被引:109
|
作者
Howard, Waylon J. [1 ]
Rhemtulla, Mijke [2 ]
Little, Todd D. [3 ,4 ]
机构
[1] Seattle Childrens Hosp Res Inst, Ctr Child Hlth Behav & Dev, Seattle, WA 98121 USA
[2] Univ Amsterdam, Psychol Methods, NL-1012 WX Amsterdam, Netherlands
[3] Texas Tech Univ, Educ Psychol, Lubbock, TX 79409 USA
[4] Inst Measurement Methodol Anal & Policy, London, England
基金
美国国家科学基金会;
关键词
STRUCTURAL EQUATION MODELS; MULTIPLE IMPUTATION; STRATEGIES; PERFORMANCE; REGRESSION;
D O I
10.1080/00273171.2014.999267
中图分类号
O1 [数学];
学科分类号
0701 ; 070101 ;
摘要
To deal with missing data that arise due to participant nonresponse or attrition, methodologists have recommended an inclusive strategy where a large set of auxiliary variables are used to inform the missing data process. In practice, the set of possible auxiliary variables is often too large. We propose using principal components analysis (PCA) to reduce the number of possible auxiliary variables to a manageable number. A series of Monte Carlo simulations compared the performance of the inclusive strategy with eight auxiliary variables (inclusive approach) to the PCA strategy using just one principal component derived from the eight original variables (PCA approach). We examined the influence of four independent variables: magnitude of correlations, rate of missing data, missing data mechanism, and sample size on parameter bias, root mean squared error, and confidence interval coverage. Results indicate that the PCA approach results in unbiased parameter estimates and potentially more accuracy than the inclusive approach. We conclude that using the PCA strategy to reduce the number of auxiliary variables is an effective and practical way to reap the benefits of the inclusive strategy in the presence of many possible auxiliary variables.
引用
收藏
页码:285 / 299
页数:15
相关论文
共 50 条
  • [31] REGRESSION ESTIMATION OF CROP ACREAGES WITH TRANSFORMED LANDSAT DATA AS AUXILIARY VARIABLES
    HUNG, HM
    FULLER, WA
    [J]. JOURNAL OF BUSINESS & ECONOMIC STATISTICS, 1987, 5 (04) : 475 - 482
  • [32] Optimal imputation of the missing data using multi auxiliary information
    Shashi Bhushan
    Abhay Pratap Pandey
    [J]. Computational Statistics, 2021, 36 : 449 - 477
  • [33] Optimal imputation of the missing data using multi auxiliary information
    Bhushan, Shashi
    Pandey, Abhay Pratap
    [J]. COMPUTATIONAL STATISTICS, 2021, 36 (01) : 449 - 477
  • [34] AUXILIARY VARIABLES IN DATA REFINEMENT
    MORGAN, C
    [J]. INFORMATION PROCESSING LETTERS, 1988, 29 (06) : 293 - 296
  • [35] Retaining principal components for discrete variables
    Solanas, Antonio
    Manolov, Rumen
    Leiva, David
    Richards, Maria Marta
    [J]. ANUARIO DE PSICOLOGIA, 2011, 41 (1-3): : 33 - 50
  • [36] ANALYSIS OF SOCIOECONOMIC VARIABLES INTO THEIR PRINCIPAL COMPONENTS
    QUERTON, A
    [J]. REVUE DE L INSTITUT DE SOCIOLOGIE, 1974, (02): : 239 - 247
  • [37] Missing data and auxiliary information in surveys
    Rueda, M
    González, S
    [J]. COMPUTATIONAL STATISTICS, 2004, 19 (04) : 551 - 567
  • [38] Missing data and auxiliary information in surveys
    M. Rueda
    S. González
    [J]. Computational Statistics, 2004, 19 : 551 - 567
  • [39] A Bayesian algorithm based on auxiliary variables for estimating GRM with non-ignorable missing data
    Zhang, Jiwei
    Zhang, Zhaoyuan
    Tao, Jian
    [J]. COMPUTATIONAL STATISTICS, 2021, 36 (04) : 2643 - 2669
  • [40] A Bayesian algorithm based on auxiliary variables for estimating GRM with non-ignorable missing data
    Jiwei Zhang
    Zhaoyuan Zhang
    Jian Tao
    [J]. Computational Statistics, 2021, 36 : 2643 - 2669