Composite Imputation Method for the Multiple Linear Regression with Missing at Random Data

被引:0
|
作者
Thongsri, Thidarat [1 ,2 ]
Samart, Klairung [1 ,2 ]
机构
[1] Prince Songkla Univ, Fac Sci, Div Computat Sci, Hat Yai, Thailand
[2] Prince Songkla Univ, Fac Sci, Stat & Applicat Res Unit, Hat Yai, Thailand
关键词
Composite method; imputation method; missing data; missing at random; multiple linear regression; HOT DECK IMPUTATION;
D O I
暂无
中图分类号
O1 [数学];
学科分类号
0701 ; 070101 ;
摘要
Missing data is a common occurrence in the data collection process. If this problem is ignored it can lead to unreliable conclusions. Our research objective is to develop a method for handling missing data in multiple linear regression at random on both response and independent variables and to compare its efficiency with existing techniques. For handling missing data, five so-called techniques were employed; namely, listwise deletion (LD), hot deck imputation (HD), predictive mean matching imputation (PMM), stochastic regression imputation (SR), and random forest imputation (RF). We compare them with the following proposed composite imputation method: stochastic regression random forest with equivalent weight (SREW). SREW is derived from a combination of stochastic regression and random forest methods weighted by the equivalent weighted method. In this study, the Monte Carlo simulations were done under the sample sizes of 30, 60, 90, 120 and 150 along with the missing percentages of 10%, 20%, 30% and 40% and the standard deviations of error of 1, 3 and 5. The criterion to compare the efficiency is the average mean square error (AMSE). The results show that the SREW is most efficient in all situations whereas the hot deck gives the highest AMSE in almost all cases, especially when the missing percentage is high.
引用
收藏
页码:51 / 62
页数:12
相关论文
共 50 条
  • [41] Wind power prediction with missing data using Gaussian process regression and multiple imputation
    Liu, Tianhong
    Wei, Haikun
    Zhang, Kanjian
    [J]. APPLIED SOFT COMPUTING, 2018, 71 : 905 - 916
  • [42] Large sample results for frequentist multiple imputation for Cox regression with missing covariate data
    Eriksson, Frank
    Martinussen, Torben
    Nielsen, Soren Feodor
    [J]. ANNALS OF THE INSTITUTE OF STATISTICAL MATHEMATICS, 2020, 72 (04) : 969 - 996
  • [43] Large sample results for frequentist multiple imputation for Cox regression with missing covariate data
    Frank Eriksson
    Torben Martinussen
    Søren Feodor Nielsen
    [J]. Annals of the Institute of Statistical Mathematics, 2020, 72 : 969 - 996
  • [44] Cost-effectiveness analysis of clinical trials with missing data: using multiple imputation to address data missing not at random
    Leurent, Baptiste
    Gomes, Manuel
    Carpenter, James
    [J]. TRIALS, 2017, 18
  • [45] Methods for the Analysis of Explanatory Linear Regression Models with Missing Data Not at Random
    José Blas Navarro Pastor
    [J]. Quality and Quantity, 2003, 37 (4) : 363 - 376
  • [46] Adaptive predictor-set linear model: An imputation-free method for linear regression prediction on data sets with missing values
    Jimenez, Benjamin Planterose
    Kayser, Manfred
    Vidaki, Athina
    Caliebe, Amke
    [J]. BIOMETRICAL JOURNAL, 2024, 66 (04)
  • [47] Methods for the analysis of explanatory linear regression models with missing data not at random
    Pastor, JBN
    [J]. QUALITY & QUANTITY, 2003, 37 (04) : 363 - 376
  • [48] Missing data imputation using classification and regression trees
    Chen, Cheng-Yang
    Chang, Yu-Wei
    [J]. PEERJ COMPUTER SCIENCE, 2024, 10
  • [49] REGRESSION IMPUTATION OF MISSING VALUES IN LONGITUDINAL DATA SETS
    SCHNEIDERMAN, ED
    KOWALSKI, CJ
    WILLIS, SM
    [J]. INTERNATIONAL JOURNAL OF BIO-MEDICAL COMPUTING, 1993, 32 (02): : 121 - 133
  • [50] Regression imputation in the functional linear model with missing values in the response
    Crambes, Christophe
    Henchiri, Yousri
    [J]. JOURNAL OF STATISTICAL PLANNING AND INFERENCE, 2019, 201 : 103 - 119