The more data, the better? Demystifying deletion-based methods in linear regression with missing data

被引:0
|
作者
Xu, Tianchen [1 ]
Chen, Kun [2 ]
Li, Gen [3 ]
机构
[1] Columbia Univ, Mailman Sch Publ Hlth, New York, NY 10032 USA
[2] Univ Connecticut, Dept Stat, Storrs, CT 06269 USA
[3] Univ Michigan, Sch Publ Hlth, Ann Arbor, MI 48109 USA
基金
美国国家科学基金会; 美国国家卫生研究院;
关键词
Asymptotic variance; Available-case analysis; Complete-case analysis; Missing data; LEAST-SQUARES; IMPUTATION; MODELS;
D O I
暂无
中图分类号
Q [生物科学];
学科分类号
07 ; 0710 ; 09 ;
摘要
We compare two deletion-based methods for dealing with the problem of missing observations in linear regression analysis. One is the complete-case analysis (CC, or listwise deletion) that discards all incomplete observations and only uses common samples for ordinary least-squares estimation. The other is the available-case analysis (AC, or pairwise deletion) that utilizes all available data to estimate the covariance matrices and applies these matrices to construct the normal equation. We show that the estimates from both methods are asymptotically unbiased under missing completely at random (MCAR) and further compare their asymptotic variances in some typical situations. Surprisingly, using more data (i.e., AC) does not necessarily lead to better asymptotic efficiency in many scenarios. Missing patterns, covariance structure and true regression coefficient values all play a role in determining which is better. We further conduct simulation studies to corroborate the findings and demystify what has been missed or misinterpreted in the literature. Some detailed proofs and simulation results are available in the online supplemental materials.
引用
收藏
页码:515 / 526
页数:12
相关论文
共 50 条
  • [1] Predicting missing data for data integrity based on the linear regression model
    Gao, Kai
    Chang, Chin-Chen
    Liu, Yanjun
    [J]. INTERNATIONAL JOURNAL OF EMBEDDED SYSTEMS, 2021, 14 (04) : 355 - 362
  • [2] Methods for the Analysis of Explanatory Linear Regression Models with Missing Data Not at Random
    José Blas Navarro Pastor
    [J]. Quality and Quantity, 2003, 37 (4) : 363 - 376
  • [3] Development of Imputation Methods for Missing Data in Multiple Linear Regression Analysis
    Thidarat Thongsri
    Klairung Samart
    [J]. Lobachevskii Journal of Mathematics, 2022, 43 : 3390 - 3399
  • [4] Methods for the analysis of explanatory linear regression models with missing data not at random
    Pastor, JBN
    [J]. QUALITY & QUANTITY, 2003, 37 (04) : 363 - 376
  • [5] Development of Imputation Methods for Missing Data in Multiple Linear Regression Analysis
    Thongsri, Thidarat
    Samart, Klairung
    [J]. LOBACHEVSKII JOURNAL OF MATHEMATICS, 2022, 43 (11) : 3390 - 3399
  • [6] Local linear regression for generalized linear models with missing data
    Wang, CY
    Wang, SJ
    Gutierrez, RG
    Carroll, RJ
    [J]. ANNALS OF STATISTICS, 1998, 26 (03): : 1028 - 1050
  • [7] A Comparison of Estimation Methods for Missing Data in Multiple Linear Regression with Two Independent Variables
    Suraphee, Sujitta
    Raksmanee, Chancharoen
    Busaba, Jaruchat
    Chaisorn, Chanchai
    Nakornthai, Wilaiwan
    [J]. THAILAND STATISTICIAN, 2006, 4 : 13 - 26
  • [8] Imputation Methods for Multiple Regression with Missing Heteroscedastic Data
    Asif, Muhammad
    Samart, Klairung
    [J]. THAILAND STATISTICIAN, 2022, 20 (01): : 1 - 15
  • [9] Regression in the presence missing data using ensemble methods
    Hassan, Mostafa M.
    Atiya, Amir F.
    El-Gayar, Neamat
    El-Fouly, Raafat
    [J]. 2007 IEEE INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS, VOLS 1-6, 2007, : 1261 - +
  • [10] DATA ENVELOPMENT ANALYSIS WITH MISSING DATA: A MULTIPLE LINEAR REGRESSION ANALYSIS APPROACH
    Chen, Ya
    Li, Yongjun
    Wu, Huaqing
    Liang, Liang
    [J]. INTERNATIONAL JOURNAL OF INFORMATION TECHNOLOGY & DECISION MAKING, 2014, 13 (01) : 137 - 153