Composite Imputation Method for the Multiple Linear Regression with Missing at Random Data

被引:0
|
作者
Thongsri, Thidarat [1 ,2 ]
Samart, Klairung [1 ,2 ]
机构
[1] Prince Songkla Univ, Fac Sci, Div Computat Sci, Hat Yai, Thailand
[2] Prince Songkla Univ, Fac Sci, Stat & Applicat Res Unit, Hat Yai, Thailand
关键词
Composite method; imputation method; missing data; missing at random; multiple linear regression; HOT DECK IMPUTATION;
D O I
暂无
中图分类号
O1 [数学];
学科分类号
0701 ; 070101 ;
摘要
Missing data is a common occurrence in the data collection process. If this problem is ignored it can lead to unreliable conclusions. Our research objective is to develop a method for handling missing data in multiple linear regression at random on both response and independent variables and to compare its efficiency with existing techniques. For handling missing data, five so-called techniques were employed; namely, listwise deletion (LD), hot deck imputation (HD), predictive mean matching imputation (PMM), stochastic regression imputation (SR), and random forest imputation (RF). We compare them with the following proposed composite imputation method: stochastic regression random forest with equivalent weight (SREW). SREW is derived from a combination of stochastic regression and random forest methods weighted by the equivalent weighted method. In this study, the Monte Carlo simulations were done under the sample sizes of 30, 60, 90, 120 and 150 along with the missing percentages of 10%, 20%, 30% and 40% and the standard deviations of error of 1, 3 and 5. The criterion to compare the efficiency is the average mean square error (AMSE). The results show that the SREW is most efficient in all situations whereas the hot deck gives the highest AMSE in almost all cases, especially when the missing percentage is high.
引用
收藏
页码:51 / 62
页数:12
相关论文
共 50 条
  • [1] Development of Imputation Methods for Missing Data in Multiple Linear Regression Analysis
    Thidarat Thongsri
    Klairung Samart
    [J]. Lobachevskii Journal of Mathematics, 2022, 43 : 3390 - 3399
  • [2] Development of Imputation Methods for Missing Data in Multiple Linear Regression Analysis
    Thongsri, Thidarat
    Samart, Klairung
    [J]. LOBACHEVSKII JOURNAL OF MATHEMATICS, 2022, 43 (11) : 3390 - 3399
  • [3] Regression multiple imputation for missing data analysis
    Yu, Lili
    Liu, Liang
    Peace, Karl E.
    [J]. STATISTICAL METHODS IN MEDICAL RESEARCH, 2020, 29 (09) : 2647 - 2664
  • [4] Multiple imputation of ordinal missing not at random data
    Hammon, Angelina
    [J]. ASTA-ADVANCES IN STATISTICAL ANALYSIS, 2023, 107 (04) : 671 - 692
  • [5] Multiple imputation of ordinal missing not at random data
    Angelina Hammon
    [J]. AStA Advances in Statistical Analysis, 2023, 107 : 671 - 692
  • [6] Imputation Methods for Multiple Regression with Missing Heteroscedastic Data
    Asif, Muhammad
    Samart, Klairung
    [J]. THAILAND STATISTICIAN, 2022, 20 (01): : 1 - 15
  • [7] Multiple imputation of binary multilevel missing not at random data
    Hammon, Angelina
    Zinn, Sabine
    [J]. JOURNAL OF THE ROYAL STATISTICAL SOCIETY SERIES C-APPLIED STATISTICS, 2020, 69 (03) : 547 - 564
  • [8] Multiple Imputation of Missing Composite Outcomes in Longitudinal Data
    O’Keeffe A.G.
    Farewell D.M.
    Tom B.D.M.
    Farewell V.T.
    [J]. Statistics in Biosciences, 2016, 8 (2) : 310 - 332
  • [9] Multiple Imputation for Missing Data via Sequential Regression Trees
    Burgette, Lane F.
    Reiter, Jerome P.
    [J]. AMERICAN JOURNAL OF EPIDEMIOLOGY, 2010, 172 (09) : 1070 - 1076
  • [10] Auxiliary Variables in Multiple Imputation When Data Are Missing Not at Random
    Mustillo, Sarah
    Kwon, Soyoung
    [J]. JOURNAL OF MATHEMATICAL SOCIOLOGY, 2015, 39 (02): : 73 - 91