Composite Imputation Method for the Multiple Linear Regression with Missing at Random Data

被引:0
|
作者
Thongsri, Thidarat [1 ,2 ]
Samart, Klairung [1 ,2 ]
机构
[1] Prince Songkla Univ, Fac Sci, Div Computat Sci, Hat Yai, Thailand
[2] Prince Songkla Univ, Fac Sci, Stat & Applicat Res Unit, Hat Yai, Thailand
关键词
Composite method; imputation method; missing data; missing at random; multiple linear regression; HOT DECK IMPUTATION;
D O I
暂无
中图分类号
O1 [数学];
学科分类号
0701 ; 070101 ;
摘要
Missing data is a common occurrence in the data collection process. If this problem is ignored it can lead to unreliable conclusions. Our research objective is to develop a method for handling missing data in multiple linear regression at random on both response and independent variables and to compare its efficiency with existing techniques. For handling missing data, five so-called techniques were employed; namely, listwise deletion (LD), hot deck imputation (HD), predictive mean matching imputation (PMM), stochastic regression imputation (SR), and random forest imputation (RF). We compare them with the following proposed composite imputation method: stochastic regression random forest with equivalent weight (SREW). SREW is derived from a combination of stochastic regression and random forest methods weighted by the equivalent weighted method. In this study, the Monte Carlo simulations were done under the sample sizes of 30, 60, 90, 120 and 150 along with the missing percentages of 10%, 20%, 30% and 40% and the standard deviations of error of 1, 3 and 5. The criterion to compare the efficiency is the average mean square error (AMSE). The results show that the SREW is most efficient in all situations whereas the hot deck gives the highest AMSE in almost all cases, especially when the missing percentage is high.
引用
收藏
页码:51 / 62
页数:12
相关论文
共 50 条
  • [21] Missing Data Imputation Method Combining Random Forest and Generative Adversarial Imputation Network
    Ou, Hongsen
    Yao, Yunan
    He, Yi
    [J]. SENSORS, 2024, 24 (04)
  • [22] Full Information Multiple Imputation for Linear Regression Model with Missing Response Variable
    Song, Limin
    Guo, Guangbao
    [J]. IAENG International Journal of Applied Mathematics, 2024, 54 (01) : 77 - 81
  • [23] Linear regression for bivariate censored data via multiple imputation
    Pan, W
    Kooperberg, C
    [J]. STATISTICS IN MEDICINE, 1999, 18 (22) : 3111 - 3121
  • [24] A multiple imputation approach to linear regression with clustered censored data
    Pan, W
    Connett, JE
    [J]. LIFETIME DATA ANALYSIS, 2001, 7 (02) : 111 - 123
  • [25] A Multiple Imputation Approach to Linear Regression with Clustered Censored Data
    Wei Pan
    John E. Connett
    [J]. Lifetime Data Analysis, 2001, 7 : 111 - 123
  • [26] MICROARRAY MISSING DATA IMPUTATION USING REGRESSION
    Bayrak, Tuncay
    Ogul, Hasan
    [J]. 2017 13TH IASTED INTERNATIONAL CONFERENCE ON BIOMEDICAL ENGINEERING (BIOMED), 2017, : 68 - 73
  • [27] Guided multiple imputation of missing data - Using a subsample to strengthen the missing-at-random assumption
    Fraser, Gary
    Ru Yan
    [J]. EPIDEMIOLOGY, 2007, 18 (02) : 246 - 252
  • [28] Multiple Imputation For Missing Ordinal Data
    Chen, Ling
    Toma-Drane, Mariana
    Valois, Robert F.
    Drane, J. Wanzer
    [J]. JOURNAL OF MODERN APPLIED STATISTICAL METHODS, 2005, 4 (01) : 288 - 299
  • [29] Multiple imputation with missing data indicators
    Beesley, Lauren J.
    Bondarenko, Irina
    Elliot, Michael R.
    Kurian, Allison W.
    Katz, Steven J.
    Taylor, Jeremy M. G.
    [J]. STATISTICAL METHODS IN MEDICAL RESEARCH, 2021, 30 (12) : 2685 - 2700
  • [30] MULTIPLE IMPUTATION AS A MISSING DATA MACHINE
    BRAND, J
    VANBUUREN, S
    VANMULLIGEN, EM
    TIMMERS, T
    GELSEMA, E
    [J]. JOURNAL OF THE AMERICAN MEDICAL INFORMATICS ASSOCIATION, 1994, : 303 - 306